This is something I’ve needed to get off my chest; I really and truly don’t understand the logic behind the scheme of Docker container image names. The first time I tried to grasp it, I got mentally stuck, because it feels to me it blatant ignores any convention. At the moment it is a matter of accepting that this it how it works, but my neurotic side keeps protesting every time I tag an image. Let me explain the beef I have, and then maybe someone can enlighten me.
(N.B. I’m using “docker” in this post, but I actually am using “podman”. But their APIs are identical, so because docker is better known, I’m typing “docker” in this post.)
When you create a new image using the “docker build” command, the newly created image is uniquely identified by a hash, something in the line of 21df6af2a82e0135661ce18a621112450992914e1fcfc0edb285c332318e3e23. This id is totally impossible to convey to other people, so Docker has the possibility to add tags to images. A typical build command looks like
$ docker build --tag my-tag .
This is will use the Dockerfile (Containerfile for podman) to generate an image and add it to the local registry. You can see this by executing a list command:
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/my-tag latest 99bd5445f871 22 minutes ago 280 MB
Wait! What? What is the localhost doing there? And latest?
This output is not what I expected. The first time I used Docker it actually took me serieus time to grasp Docker’s naming concepts, because it does not adhere to the common usage of established concepts. Let’s start with tags: a tag is defined on wikipedia as “a keyword or term assigned to a piece of information“. A keyword or term. So if I had a container for an angular application, based on a Java backend running on Ubuntu for time registration, I’d expect tags like angular, java, rest, ubuntu, time-registration. Tags are simple, and to prevent confusion between java and Java often either case insensitive, or forced to a case (like Docker requires lowercase).
The tag information listed by Docker is not “a keyword”. Even worse, the part that is listed under TAG is not even the tag I specified. Based on the output, tags apparently do something with versioning, because that is what latest suggests. And the tag label I specified ended up under REPOSITORY, so apparently the tag also tells something about where the image is stored. Tags, repositories and versioning are well known concepts, so why stuff this into a tag instead of formalizing?
$ notdocker build --repository localhost --name my-tag --version latest
This usage of the concept of a tag can be compared to bad method naming; if you have a method named check you don’t expect it to create or find. Likewise a tag has a commonly accepted meaning, and that does not involve repositories and versions.
Another issue I have is that tags usually are applicable to many objects they are attached to. For example: there can be many images where the java tag applies to. The information we see here resembles a unique identification. This becomes even more apparent when the image is deployed onto a remote repository. The hostname and port number of the repository is prefixed in the tag.
$ docker build --tag myrepo.tbee.org:5000/my-tag .
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/my-tag latest 99bd5445f871 22 minutes ago 280 MB myrepo.tbee.org:5000/my-tag latest 99bd5445f871 22 minutes ago 280 MB
This new tag may give the impression these are two different images, but they are not: it is the same image (as the image id shows) deployed into different repositories.
But that is not correct either: the second image does not yet exist in my remote repository. I first have to push the image. But until that happens the information in that tag is false. We are somehow halfway through a process and our database is not in a consistent state, but the transaction has committed anyhow. Or maybe we hope to be eventually consistent, but if the push never is executed, we never will.
$ docker login -u <user> -p <password> https://myrepo.tbee.org:5000 $ docker push myrepo.tbee.org:5000/my-tag
The confusion gets even worse if you add a prefix to the tag, which is allowed. This is needed because others may be using the same name, sorry, tag for the image, and we do not want conflicts.
$ docker build --tag org.tbee/my-tag .
But in the statement above “org.tbee” is a repository, because that is implicitly part of the tag. So the correct statement would be:
$ docker build --tag localhost/org.tbee/my-tag .
It feels like a big conceptual mess.
This all started with the hash value being too abstract to convey, and the need for a more meaningful identification. That is called an alias, or pseudonym, or alternate name as Wikipedia describes it, not a tag. So let’s call it that, use commonly known concepts. But beside that we have more concepts:
- Version: in computer world we have the concept of versions of the same logical thing.
- Group: other people may choose the same alias, so it is important to have a context, a grouping.
- Repositories: the thing we created can be uploaded into repositories, so we need a way to identify that.
This has been done before. Maven uses a coordinate system identified by groupId:artifactId:version, and HTTPS URLs for the repository. That would work. We could also take a REST inspired approach, as that is more technology agnostic than Maven coordinates.
And that would look something like this:
$ notdocker build my-time-registration $ notdocker build my-time-registration:latest $ notdocker build org/tbee/my-time-registration $ notdocker build org/tbee/my-time-registration:latest $ notdocker images ALIAS VERSION IMAGE ID CREATED SIZE my-time-registration latest 99bd5445f871 22 minutes ago 280 MB org/tbee/my-time-registration latest 99bd5445f871 22 minutes ago 280 MB $ notdocker push org/tbee/my-time-registration:latest docker://myrepo.tbee.org:5000 $ notdocker pull docker://myrepo.tbee.org:5000/org/tbee/my-time-registration:latest
First of all, the build requires at least one alias, but that is just a unique id, there is no notion of a repository. It may specify a (multilevel) group, to make sure it does not conflict with other similar named images. It may specify the version, and if not latest is assumed. And you can give multiple aliases to the same image. (Don’t know why you would want that, because it’s the same thing.) Next you push an alias to a repository. Which can then be address using a URL-like address. Oh yes, and uppercase is allowed, because it is unique, so no reason to prevent case-differences. 😀
Maybe someone can explain why I’m seeing this all wrong and Docker gets it right.
Anyhow, this is not how Docker works. So to prevent confusion I’ll be using dot-notation for groups (org.tbee.my-tag:latest).