For everybody complaining about having to pay actual money for goods and services: if you're not okay with this you can run a self hosted registry.
The out of the box registry does very little and has a very poor user experience.
But nowadays there are Harbor from VMware and Quay from RedHat that are open source and easily self-hostable.
We run our own Harbor instance at work and I can tell you... Docker images are NOT light. You think they are, they are not. It's easy to have images proliferate a lot and burn a lot of disk space. Under some conditions when layers are shared among too many images (can't recall the exact details here) deleting an image may result in also deleting a lot more images (and this is not the correct/expected/wanted behaviour) and that means that under some circumstances you have to retain a lot more images or layers than you think you should.
The thing is, I can only wonder how much bandwidth and disk space (oh and disk space must be replicated for fault tolerance) must cost running a public registry for everybody.
It hurts the open source ecosystem a bit, I understand... Maybe some middle ground will be reached, dunno.
Edit: I also run harbor at home, it's not hard to setup and operate, you should really check that out.
"It turns out this is actually possible, but not using the genuine Docker CE or EE version.
You can either use Red Hat's fork of docker with the '--add-registry' flag or you can build docker from source yourself with registry/config.go modified to use your own hard-coded default registry namespace/index."
No you have misunderstood the issue. You can use any registry, just write out the domain for it, this has always worked and is very widely used. Red Hat changed the default if you don't specify a FQDN, before they decided not to ship Docker at all.
> The out of the box registry does very little and has a very poor user experience.
I think this is indirectly what people are complaining about. Having a free registry mitigates that. So they aren't far off track.
It's true we shouldn't be bitter about Docker. They did a lot to improve the development ecosystem. We should try to avoid picking technologies in the future that aren't scalable in both directions though.
For example, PostgreSQL works well for a 1GB VPS containing it and the web server for dozens of users, and it also works well for big sites. With MongoDB the VPS doesn't work so well.
I know about that, and we used to use that but had to move away from it because it created a lot of scalability problems (mind you, most of them due to disk usage).
With harbor we can save docker images layers to switft, the openstack flavor of object storage (S3). That solves a lot of scalability problems.
AFAIK gitlab ships the docker Registry underneath so the problems stay, mostly. I think that harbor does the same. I skimmed the harbor source and it seems that it forwards http requests to the docker registry if you hit the registry API endpoints.
Haven't looked at Quay but as far as I know wherever there's the docker registry you'll have garbage collection problems.
One note on the side: I think that quay missed their chance to become the goto docker registry. Red Hat basically open sourced it after harbor had been incubated in the cncf (unsurprisingly, harbor development has skyrocketed after that event).
It always blows my mind when people complain about free services, and products are no longer going to be free. I learned that the free tier based customers, are the worst to support when I built Amezmo. Just like with the MailGun pricing changes, we'll have the people complaining about how it's no longer free.
We run on openstack, managed by a local yet fairly large openstack provider (Irideos, their devops/consulting team is top notch and has helped us adopt many cloud-native technologies while still staying fairly vendor-agnostic).
I can't see the bills, but we never worry about bandwidth usage and I am fairly sure that bandwidth is free, basically.
Keep in mind that since we run our own harbor instance, most of the image pulls happen within our openstack network, so that does/would not count against bandwidth usage (but image storage does). In terms of bandwidth thus, we can happily set "always" as imagePullPolicy in our kubernetes clusters.
Edit: openstack works remarkably well. The horizon web interface is slow as molasses but thanks to terraform we rarely have to use it.
"Fine we will just pay" - I have a personal account then 4 orgs, that's ~ 500 USD / year to keep older OSS online for users of openfaas/inlets/etc.
"We'll just ping the image very 6 mos" - you have to iterate and discover every image and tag in the accounts then pull them, retry if it fails. Oh and bandwidth isn't free.
"Doesn't affect me" - doesn't it? If you run a Kubernetes cluster, you'll do 100 pulls in no time from free / OSS components. The Hub will rate-limit you at 100 per 6 hours (resets every 24?). That means you need to add an image pull secret and a paid unmanned user to every Kubernetes cluster you run to prevent an outage.
"You should rebuild images every 6 mo anyway!" - have you ever worked with an enterprise company? They do not upgrade like we do.
"It's fair, about time they charged" - I agree with this, the costs must have been insane, but why is there no provision for OSS projects? We'll see images disappear because people can't afford to pay or to justify the costs.
> have you ever worked with an enterprise company? They do not upgrade like we do.
I'm sure someone somewhere is going to shed a tear for the enterprise organisations with shady development practices using the free tier who may be slightly inconvenienced.
What is an “inactive” image?
An inactive image is a container image that has not been either pushed or pulled from the image repository in 6 or months.
That may well be true, but now I have to pull images every 6 months, some I very much doubt I’ll ever upgrade but will pull anytime I format the relevant host.
It sucks that this isn’t for new images only. Now I have to go and retrospectively move my old images to a self hosted registry, update all my absolve scripts to the new uris, debug any changes, etc.
Companies might base their image based on another image in the docker registry. That image might be good now, might be good in two years, but what if I want to pull a, say .NET Core 1.1 docker image in four years?
Now, .NET Core 1.1 might not be the best example, but I'm sure you can think of some example.
I feel like the main response should be "OK, we'll just host our own Docker Registry."
This has been available as a docker image since the very beginning, which might not be good enough for everyone, but I think it will work for me and mine.
Agreed that self-hosting registries should be way more common than it is today and maybe even standard practice.
It's crazy easy to do; just start the registry container with a mapped volume and you're done.
Securing, adding auth/auth and properly configuring your registry for exposure to the public internet, though... The configuration for that is very poorly documented IMO.
Note that for OSS images, that's a non-trivial thing to do—you have to have somewhere to run the image, and somewhere to store your images (e.g. S3), both of which are non-free, and would also require more documentation and less discoverability than Docker Hub offers.
A friend of mine offers Docker (and more) repository hosting for $5/mo. He is competent and friendly and I would recommend his product: https://imagecart.cloud/
Apologies for the hand-waving, but is there a well-known community sponsored public peer-to-peer registry service, based on https://github.com/uber/kraken perhaps?
It looks like if anyone pulls an image within 6 months, then the counter is reset. It seems like it's not too onerous to me—for any of the OSS images I've maintained, they are typically pulled hundreds if not thousands of times a day.
Sometimes I don't push a new image version (if it's not critical to keep up with upstream security releases) for many months to a year or longer, but those images are still pulled frequently (certainly more than once every 6 months).
I didn't see any notes about rate limiting in that FAQ, did I miss something?
The FAQ is a bit incomplete or trying to hide it. Section 2.5 of the TOS also introduced a pull rate provision. You can see it on the pricing page, https://www.docker.com/pricing
> "Doesn't affect me" - doesn't it? If you run a Kubernetes cluster, you'll do 100 pulls in no time from free / OSS components. The Hub will rate-limit you at 100 per 6 hours (resets every 24?). That means you need to add an image pull secret and a paid unmanned user to every Kubernetes cluster you run to prevent an outage.
I can't find this. It's not in the original link, is it?
> "We'll just ping the image very 6 mos" - you have to iterate and discover every image and tag in the accounts then pull them, retry if it fails. Oh and bandwidth isn't free.
Set up CircleCI or similar to pull all your images once a month :)
I'm not sure what protocol is used for pulling Docker images, but perhaps it could be enough to just initiate the connection, get Docker Hub to start sending data, and immediately terminate the connection. This should save bandwidth on both ends.
This will be quite bad for reproducible science. Publishing bioinformatics tools as containers was becoming quite popular. Many of these tools have a tiny niche audience and when a scientist wants to try to reproduce some results from a paper published years ago with a specific version of a tool they might be out of luck.
Simplest answer is to release the code with a Dockerfile. Anyone can then inspect build steps, build the resulting image and run the experiments for themselves.
Two major issues I can see are old dependencies (pin your versions!) and out of support/no longer available binaries etc.
In which case, welcome to the world of long term support. It's a PITA.
The Dockerfile should always be published, but it does not enable reproducible builds unless the author is very careful but even so there's no support built in. It would be cool if you could embed hashes into each layer of the Dockerfile, but in practice it's very hard to achieve.
Reproducible science is definitely a good goal, but reproducible doesn't mean maintainable. Really scientists should be getting in the habit of versioning their code and datasets. Of course a docker container is better than nothing, but I would much rather have a tagged repository and a pointer to an operating system where it compiles.
It's true that many scientists tend to build their results on an ill-defined dumpster fire of a software stack, but the fact that docker lets us preserve these workflows doesn't solve the underlying problem.
FYI, and for anyone else still learning how to version and cite code: Zenodo + GitHub is the most feature rich and user-friendly combination I've found.
It seems you simply have to pull it every 5.99 months to not get it removed. So add all your images into a bash script and pull them every couple weeks using crontab and you‘re fine.
On the other side, I see the need for making money and storage/services cannot be free (someone pays somewhere for it - always), but 6 months is not that much for specific usages.
I'm sure you've cited research older than 5.99 months right?
I wish they would grandfather images before this new ToS to not get wiped so that future images would be uploaded to more stable and accepting platforms so images on Docker Hub from research pre-ToS update don't get wiped.
Why? It'll force a shift to a more elegant and general model of specifying software environments. We shouldn't be relying on specific images but specific reproducible instructions for building images. Relying on a specific Docker image for reproducible science is like relying on hunk of platinum and iridium to know how big a kilogram is: artifact-based science is totally obsolete.
I couldn't agree more. The defense of images over instructions to build them has often been "scientists don't work this way", but to me that's either overly cynical or an indication that something is rotting in academic incentive structures.
Or store the containers in the Internet Archive alongside the paper. They’re just tarballs. Lots of options as long as you're comfortable with object storage.
Publishing containers to GitHub might be free but you have to login to GitHub to download the containers from free accounts, significantly hampering end-user usability compared to Docker Hub, particularly if 2FA authentication is enabled on a GitHub account. As mentioned elsewhere Quay.io might be another alternative.
As long as the Dockerfile is released alongside, this should not be an issue.
I don't see any valid reason why anyone would upload and share a public docker image but not its Dockerfile and therefore do not pull anything from Dockerhub that doesn't also have the Dockerfile on the Dockerhub page.
Makes sense. I don't get how Docker could offer so much free hosting in the first place. I know storage is cheap, but not this cheap. Eventually they're going to need to make these rules more stringent.
I also don't get why people put all their stuff on free services like this and expect it to work until eternity.
Come on, if you stop just half a second and think about it, you know it is a stupid idea and you know that one day you will have a problem. You really don't have to be a genius for that. Same goes for all these other kinds of "services" that are bundled together with things that used to be a one-time purchase, like cars, etc.
Oh, I now have a t.v. that can play Netflix and Youtube, but is otherwise not extendible. But what happens in ten years? T.v. still works fine, but Netflix has gone bust and this new video-service won't work. Too bad, gonna buy a new t.v then. I can get really mad about this stupid short-sightedness everybody has these days.
Enough billion dollar companies put their weight behind Docker that you'd think someone big is already running Docker hub pushing people to use their paid offerings, but that's not the case. Google created Kubernetes which is almost always used along with Docker, but they don't directly invest in Docker, Inc (at least, based on Crunchbase) and run their own container registry at gcr.io. The same goes for Amazon and Azure where their customers are increasingly moving to Docker instead of VMs, yet none of them directly back the company.
Because docker wired in their registry as the basis for everything by design. I think people had to fork docker to get one where you could create your own registry.
I think it's a good idea to NOT be pulling from someone else's image on the internet.
Do you save a copy of every web page you think might be useful later? I have a small archive of things I consider to be "at risk", but there are many things I enjoy that exist only on other people's servers now. I can't keep it all on my own machines forever, so the difficulty is guessing what will disappear and what won't.
This may not be a big deal for small-time projects. But does this mean e.g., the official Node images for older runtime versions could disappear? I recently needed to revive an app that specifically required node:8.9.4-wheezy, pushed 2 years ago. An image that specific and old will quite possibly hit 0 downloads per 6 months in short order, if it hasn't already.
Why not just pay this 60$/year? I mean if it's something important then it's worth paying for. If not - there is cheaper storage available when one can archive their containers.
Docker is partly to blame for its own predicament by conflating URIs with URNs. When you give an image reference as `foo/bar`, the implicit actual name is `index.docker.io/foo/bar`.
That means that "which image" is mixed with "where the image is". You can't deal with them separately. Because everyone uses the shorthand, Docker gets absolutely pummeled. Meanwhile in Java-land, private Maven repos are as ordinary as dirt and a well-smoothed path.
It's time for a v3 of the Registry API, to break this accidental nexus and allow purely content-addressed references.
> the implicit actual name is `index.docker.io/foo/bar`
`index.docker.io/foo/bar:latest` to be more exact, which is a URL, but not really a URI if we're being pedantic.
Docker doesn't really provide an interface to address images by URI (which would be more like the SHA), though in practice, tags other than latest should function closer to a URI
The second issue, is they purposefully do not allow you to change the default domain to point to. The only thing you can do it use a pull through a proxy.
That's fantastic, my main issue with Docker Hub is that there's a ton of unmaintained and out of date images.
Some just pollutes my search result, I don't care that "yes, technically there's an image that does this thing I want, but it's Ubuntu 14.04 and 4 years old".
Even better, it prevents people from using these unmaintained image as a base for new project, which they will do, because many developer don't look at the Dockerfile and actually review the images they use in shipping product.
As a bonus perhaps this will mean that some a the many image of extremely low quality will go away.
I think it's fair, now you can either pay or maintain your images.
You really shouldn't be pulling someones random images off of dockerhub. If I made a POC 4 years ago on some random kubernetes configuration/tutorial that I was testing and I decided to use dockerhub to host its images (as one typically does, and it used to not have private repos) I'm not posting that for you to come consume 4 years later out of the blue in production because you found it randomly via the search.
You also tend to have no idea what's in those images and what context people are creating them under. Sure, a lot of us know to check the dockerfile, github repo, etc but I have images with 10k+ downloads from OSS contributions but as you've said a whole lot of developers just grab whatever looks fitting on there. My biggest dockerhub pull has no dockerfile, no github repo, and is a core network configuration component I put up randomly just for my own testing because no docker image for it existed years ago.
You’re right, but people tend to see Docker Hub as some master registry for quality and official images, even if it never claimed to be such a thing. Reading and understanding the Dockerfile it vital, before deciding to use it in any sort of production environment. The never policy well help clean up Docker Hub.
If people really think this is a problem, they'd contribute a non-abusive solution. Writing cron jobs to pull periodically in order to artificially reset the timer is abusive.
Non-abusive solutions include:
- extending docker to introduce reproducible image builds
- extending docker push and pull to allow discovery from different sources that use different protocols like IPFS, TahoeLAFS, or filesharing hosts
I'm sure you can come up with more solutions that don't abuse the goodwill of people.
If their business model isn't working for them, it's their job to fix it in a way that does. I don't see how you can put the responsibility on users. If you say to your users their data will be deleted if they don't pull it at least once per N months, well that's exactly what they will do, and they are perfectly in their right to automate that process.
>- extending docker to introduce reproducible image builds
It's already reproducible... sort of. All you need is to eliminate any outside variables that can affect the build process. This mainly takes the form of network access (eg. to run npm install, for instance).
IPFS is only a partial solution, if you are the only one to have a copy and you pull the plug, content is gone. You would need a bot that takes care of maintained at least 3 or 5 copies always available on the whole file system.
This seems like a non-issue, if you need to keep a rarely-used image alive for some reason just write a cron job to pull it once every six months. If the goal is long term archival it should be entrusted to something like Internet Archive.
"""
Can I use Quay for free?
Yes! We offer unlimited storage and serving of public repositories. We strongly believe in the open source community and will do what we can to help!
"""
Probably not at 50 cents per GB of data transfer outside of Github Actions. Unfortunately the only place you can viably use Github registry right now is inside Github actions
I would delete my own images to clear up room on dockerhub, but they dont have an api to remove images. the only way is to manually click the x in the UI.
So, in a lot of ways, docker forced us to "abuse" their service and store thousands of images on a free/open source account.
I get this change, and it was inevitable. But its still ironic that you cant delete tour own images. The best way to delete your image is to just stop using it and let docker delete it for you in 6 months.
I agree, I have a little tool called `php-version-audit` that literally becomes useless after a few weeks without an update (you can't audit your php version without the knowledge of the latest CVEs). I have manually cleaned up old images like you say by clicking through them all, but having a way to define retention limits is a feature to me.
Actually spent some time looking at this today. It’s a bit more complex than I was hopping. As right now I’m the only user. And the only way to conecto to my registry atm is through wireguard.
Seems like companies are relearning what they should have in the 2001 dotcom bust.
Keep free stuff free and add paid stuff. If your free stuff isn't sustainable, you really should have though that through early on.
This limit seems reasonable, because storage costs are expensive. But it should have been implemented day one so people have reasonable expectations on retention. Other's have mentioned open source projects and artifacts for scientific publication being two niche use cases where people still might want this data years later, but it'd be rare for it to be pulled every six months.
I only have a few things on docker hub, but I'll probably move them to a self-hosted repo pretty soon. At least if it's self hosted, I know it will stay up until I die and my credit cards stop working.
I think in Docker's case, in their original plan this free unlimited hosting was probably sustainable in a freemium model where businesses paid for Docker Enterprise and Docker.com was about marketing and user acquisition, similar to open source on GitHub.com being marketing and user acquisition for paid accounts/Github Enterprise.
Its not an unreasonable strategy to provide generous free hosting if you derive some other business benefit from it (YouTube being another example).
But Docker Inc. found their moat was not that deep and other projects from the big cloud providers killed the market they saw for Docker Enterprise and they sold it off.
So now they just have docker.com and Docker CE - which even that has alternatives now with other runtimes existing. So they need to make docker.com a profitable business on its own or find something else to do which changes the equation significantly.
If you've never used Singularity containers[0], I highly recommend checking them out. They make a very different set of tradeoffs compared to Docker, which can be a nice fit for some uses. Here's a few of my favorites:
* Images are just files. You can copy them around (or archive them) like any other file. Docker's layer system is cool but brings a lot of complexity with it.
* You can build them from Docker images (it'll even pull them directly from Dockerhub).
* Containers are immutable by default.
* No daemon. The runtime is just an executable.
* No elevated permissions needed for running.
* Easy to pipe stdin/stout through a container like any other executable.
> * Images are just files. You can copy them around (or archive them) like any other file
Never heard of Singularity before, and it does look interesting. Wanted to point out though that you can create tarballs of Docker images, copy them around, and load them into a Docker instance. This is really common for air-gapped deployments.
If they are doing this, they should add stats on when the last time an image was pulled, so you can see what is at risk of being removed. Would be curious about a graph, like NPM has a weekly downloads one so you can see how active something is.
(I work for Docker). We will be updating the UI to show status of each image (active or inactive). We will be updating the FAQ shortly to clarify this.
> An inactive image is a container image that has not been either pushed or pulled from the image repository in 6 or months.
>> How can I view the status of my images
> All images in your Docker Hub repository have a “Last pushed” date and can easily be accessed in the Repositories view when logged into your account. A new dashboard will also be available in Docker Hub that offers the ability to view the status of all of your container images.
That still does not tell the whole story, does it? I still don't know if my image have been pulled for the last six months. Only when I pushed it.
If no one pulled an image in 6 months it's probably not such a big deal for a project? And if it's open source you could still push a container yourself if you want.
I just got an email notification and while I can understand that they're doing this (all those GB must add up to a significant cost), the relatively short notice seems unnecessary.
Hmm, the very nature of layered images presumably means big storage savings; I wonder if block-level deduplication at the repository backend would be feasible too?
Remember that time you were looking for an answer to some obscure question, you find the perfect google result - description, page title and URL all indicate it’s going to answer your question so you click it, and... nothing.. the page cannot be found.
I wonder what kind of account the Home Assistant images are using. This could break a whole lot of stuff - and I've seen projects that don't publish a Dockerfile anywhere.
given their track record with developers over the years, i wouldn't be surprised if microsoft scammbles to build a competitor to docker repo service and integrates it with github.
This could also be solved by one person running a service that would crawl all public docker images and pull those that are close to expiration automatically every 6 months. At this moment I'm just curious how much resources would be needed for that
Add one more coin to the "always self host" bucket. Just another example of a service that starts free, then they pull the rug from under you and hold you hostage for their ransom.
For everybody complaining about having to pay actual money for goods and services: if you're not okay with this you can run a self hosted registry.
The out of the box registry does very little and has a very poor user experience.
But nowadays there are Harbor from VMware and Quay from RedHat that are open source and easily self-hostable.
We run our own Harbor instance at work and I can tell you... Docker images are NOT light. You think they are, they are not. It's easy to have images proliferate a lot and burn a lot of disk space. Under some conditions when layers are shared among too many images (can't recall the exact details here) deleting an image may result in also deleting a lot more images (and this is not the correct/expected/wanted behaviour) and that means that under some circumstances you have to retain a lot more images or layers than you think you should.
The thing is, I can only wonder how much bandwidth and disk space (oh and disk space must be replicated for fault tolerance) must cost running a public registry for everybody.
It hurts the open source ecosystem a bit, I understand... Maybe some middle ground will be reached, dunno.
Edit: I also run harbor at home, it's not hard to setup and operate, you should really check that out.
I thought they didn't make private registries available because it would "fragment the ecosystem"
Here's the conversation I saw:
https://stackoverflow.com/questions/33054369/how-to-change-t...
pointing to this:
https://github.com/moby/moby/issues/7203
and also there was this comment:
"It turns out this is actually possible, but not using the genuine Docker CE or EE version.
You can either use Red Hat's fork of docker with the '--add-registry' flag or you can build docker from source yourself with registry/config.go modified to use your own hard-coded default registry namespace/index."
No you have misunderstood the issue. You can use any registry, just write out the domain for it, this has always worked and is very widely used. Red Hat changed the default if you don't specify a FQDN, before they decided not to ship Docker at all.
1 reply →
> The out of the box registry does very little and has a very poor user experience.
I think this is indirectly what people are complaining about. Having a free registry mitigates that. So they aren't far off track.
It's true we shouldn't be bitter about Docker. They did a lot to improve the development ecosystem. We should try to avoid picking technologies in the future that aren't scalable in both directions though.
For example, PostgreSQL works well for a 1GB VPS containing it and the web server for dozens of users, and it also works well for big sites. With MongoDB the VPS doesn't work so well.
Yep, docker registry is absolute garbage and lack garbage collection.
FYI, Gitlab (free version) has a built in registry as well and it let's you define retention rules.
I know about that, and we used to use that but had to move away from it because it created a lot of scalability problems (mind you, most of them due to disk usage).
With harbor we can save docker images layers to switft, the openstack flavor of object storage (S3). That solves a lot of scalability problems.
AFAIK gitlab ships the docker Registry underneath so the problems stay, mostly. I think that harbor does the same. I skimmed the harbor source and it seems that it forwards http requests to the docker registry if you hit the registry API endpoints.
Haven't looked at Quay but as far as I know wherever there's the docker registry you'll have garbage collection problems.
One note on the side: I think that quay missed their chance to become the goto docker registry. Red Hat basically open sourced it after harbor had been incubated in the cncf (unsurprisingly, harbor development has skyrocketed after that event).
4 replies →
It always blows my mind when people complain about free services, and products are no longer going to be free. I learned that the free tier based customers, are the worst to support when I built Amezmo. Just like with the MailGun pricing changes, we'll have the people complaining about how it's no longer free.
Which IaaS do you use to selfhost for the one at work? How much does the network transfer cost you? Or are they docker pulls internal network?
We run on openstack, managed by a local yet fairly large openstack provider (Irideos, their devops/consulting team is top notch and has helped us adopt many cloud-native technologies while still staying fairly vendor-agnostic).
I can't see the bills, but we never worry about bandwidth usage and I am fairly sure that bandwidth is free, basically.
Keep in mind that since we run our own harbor instance, most of the image pulls happen within our openstack network, so that does/would not count against bandwidth usage (but image storage does). In terms of bandwidth thus, we can happily set "always" as imagePullPolicy in our kubernetes clusters.
Edit: openstack works remarkably well. The horizon web interface is slow as molasses but thanks to terraform we rarely have to use it.
Harbor is excellent, especially now that you can set up automatic image pruning rules that make sense.
Some thoughts / scenarios:
"Fine we will just pay" - I have a personal account then 4 orgs, that's ~ 500 USD / year to keep older OSS online for users of openfaas/inlets/etc.
"We'll just ping the image very 6 mos" - you have to iterate and discover every image and tag in the accounts then pull them, retry if it fails. Oh and bandwidth isn't free.
"Doesn't affect me" - doesn't it? If you run a Kubernetes cluster, you'll do 100 pulls in no time from free / OSS components. The Hub will rate-limit you at 100 per 6 hours (resets every 24?). That means you need to add an image pull secret and a paid unmanned user to every Kubernetes cluster you run to prevent an outage.
"You should rebuild images every 6 mo anyway!" - have you ever worked with an enterprise company? They do not upgrade like we do.
"It's fair, about time they charged" - I agree with this, the costs must have been insane, but why is there no provision for OSS projects? We'll see images disappear because people can't afford to pay or to justify the costs.
A thread with community responses - https://twitter.com/alexellisuk/status/1293937111956099073?s...
> Oh and bandwidth isn't free.
But neither is storage
> have you ever worked with an enterprise company? They do not upgrade like we do.
I'm sure someone somewhere is going to shed a tear for the enterprise organisations with shady development practices using the free tier who may be slightly inconvenienced.
Everyone should actually read the docker FAQ instead of assuming. This only applies t inactive images.
https://www.docker.com/pricing/retentionfaq
What is an “inactive” image? An inactive image is a container image that has not been either pushed or pulled from the image repository in 6 or months.
That may well be true, but now I have to pull images every 6 months, some I very much doubt I’ll ever upgrade but will pull anytime I format the relevant host.
It sucks that this isn’t for new images only. Now I have to go and retrospectively move my old images to a self hosted registry, update all my absolve scripts to the new uris, debug any changes, etc.
>"You should rebuild images every 6 mo anyway!" - have you ever worked with an enterprise company? They do not upgrade like we do.
No, but they've got cash and are not price sensitive. Wringing money out of them helps keep it cheap and/or free for everyone else.
Enterprise customers might as well fork over cash to docker rather than shudder Oracle.
Companies might base their image based on another image in the docker registry. That image might be good now, might be good in two years, but what if I want to pull a, say .NET Core 1.1 docker image in four years?
Now, .NET Core 1.1 might not be the best example, but I'm sure you can think of some example.
1 reply →
Enterprises upgrade on a slower schedule, yes, but they still patch as quickly as everybody else.
Can you patch a docker image? Sort of, but it's easier to rebuild. And that's what they do.
3 replies →
I feel like the main response should be "OK, we'll just host our own Docker Registry."
This has been available as a docker image since the very beginning, which might not be good enough for everyone, but I think it will work for me and mine.
Agreed that self-hosting registries should be way more common than it is today and maybe even standard practice.
It's crazy easy to do; just start the registry container with a mapped volume and you're done.
Securing, adding auth/auth and properly configuring your registry for exposure to the public internet, though... The configuration for that is very poorly documented IMO.
EDIT: Glancing through the docs, they do seem to have improved on making this more approachable relatively recently. https://docs.docker.com/registry/deploying/
Note that for OSS images, that's a non-trivial thing to do—you have to have somewhere to run the image, and somewhere to store your images (e.g. S3), both of which are non-free, and would also require more documentation and less discoverability than Docker Hub offers.
6 replies →
A friend of mine offers Docker (and more) repository hosting for $5/mo. He is competent and friendly and I would recommend his product: https://imagecart.cloud/
> "You should rebuild images every 6 mo anyway!" - have you ever worked with an enterprise company? They do not upgrade like we do.
Good opportunity to sell a support contract. The point still stands - a six month old image is most likely stale.
Docker is doing the ecosystem a favor.
Apologies for the hand-waving, but is there a well-known community sponsored public peer-to-peer registry service, based on https://github.com/uber/kraken perhaps?
> Oh and bandwidth isn't free.
Neither is it for Docker...
It looks like if anyone pulls an image within 6 months, then the counter is reset. It seems like it's not too onerous to me—for any of the OSS images I've maintained, they are typically pulled hundreds if not thousands of times a day.
Sometimes I don't push a new image version (if it's not critical to keep up with upstream security releases) for many months to a year or longer, but those images are still pulled frequently (certainly more than once every 6 months).
I didn't see any notes about rate limiting in that FAQ, did I miss something?
The FAQ is a bit incomplete or trying to hide it. Section 2.5 of the TOS also introduced a pull rate provision. You can see it on the pricing page, https://www.docker.com/pricing
2 replies →
> "Doesn't affect me" - doesn't it? If you run a Kubernetes cluster, you'll do 100 pulls in no time from free / OSS components. The Hub will rate-limit you at 100 per 6 hours (resets every 24?). That means you need to add an image pull secret and a paid unmanned user to every Kubernetes cluster you run to prevent an outage.
I can't find this. It's not in the original link, is it?
That information suddenly appeared(?) on their pricing page in the comparison table, near the bottom: https://www.docker.com/pricing
> "You should rebuild images every 6 mo anyway!" - have you ever worked with an enterprise company? They do not upgrade like we do.
A bunch of enterprises are going to get burned when say ubuntu:trusty-20150630 disappears.
It's not that they even have to rebuild their images... they might be pulling from one that will go stale.
> "We'll just ping the image very 6 mos" - you have to iterate and discover every image and tag in the accounts then pull them, retry if it fails. Oh and bandwidth isn't free.
Set up CircleCI or similar to pull all your images once a month :)
> Oh and bandwidth isn't free.
I'm not sure what protocol is used for pulling Docker images, but perhaps it could be enough to just initiate the connection, get Docker Hub to start sending data, and immediately terminate the connection. This should save bandwidth on both ends.
This will be quite bad for reproducible science. Publishing bioinformatics tools as containers was becoming quite popular. Many of these tools have a tiny niche audience and when a scientist wants to try to reproduce some results from a paper published years ago with a specific version of a tool they might be out of luck.
Simplest answer is to release the code with a Dockerfile. Anyone can then inspect build steps, build the resulting image and run the experiments for themselves.
Two major issues I can see are old dependencies (pin your versions!) and out of support/no longer available binaries etc.
In which case, welcome to the world of long term support. It's a PITA.
You can also save the image to a file:
https://docs.docker.com/engine/reference/commandline/image_s...
4 replies →
That doesn't help against expiring base images though.. :/
3 replies →
Tern is designed to help with this sort of thing: https://github.com/tern-tools/tern#dockerfile-lock
It can take a Dockerfile and generate a 'locked' version, with dependencies frozen, so you at least get some reproducibility.
Disclaimer work for VMware; but on a different team.
The Dockerfile should always be published, but it does not enable reproducible builds unless the author is very careful but even so there's no support built in. It would be cool if you could embed hashes into each layer of the Dockerfile, but in practice it's very hard to achieve.
My field is doing something similar.
Reproducible science is definitely a good goal, but reproducible doesn't mean maintainable. Really scientists should be getting in the habit of versioning their code and datasets. Of course a docker container is better than nothing, but I would much rather have a tagged repository and a pointer to an operating system where it compiles.
It's true that many scientists tend to build their results on an ill-defined dumpster fire of a software stack, but the fact that docker lets us preserve these workflows doesn't solve the underlying problem.
FYI, and for anyone else still learning how to version and cite code: Zenodo + GitHub is the most feature rich and user-friendly combination I've found.
https://guides.github.com/activities/citable-code/
3 replies →
It seems you simply have to pull it every 5.99 months to not get it removed. So add all your images into a bash script and pull them every couple weeks using crontab and you‘re fine.
On the other side, I see the need for making money and storage/services cannot be free (someone pays somewhere for it - always), but 6 months is not that much for specific usages.
"Pulling docker images every 5 months as a service"
7 replies →
I'm sure you've cited research older than 5.99 months right?
I wish they would grandfather images before this new ToS to not get wiped so that future images would be uploaded to more stable and accepting platforms so images on Docker Hub from research pre-ToS update don't get wiped.
4 replies →
Why? It'll force a shift to a more elegant and general model of specifying software environments. We shouldn't be relying on specific images but specific reproducible instructions for building images. Relying on a specific Docker image for reproducible science is like relying on hunk of platinum and iridium to know how big a kilogram is: artifact-based science is totally obsolete.
Hummmm, what if the instructions says to get a binary that is been deprecated 5 years ago?
What if it use a patched version of a weird library?
Software preservation is an huge topic and it is not done based on instructions.
3 replies →
I couldn't agree more. The defense of images over instructions to build them has often been "scientists don't work this way", but to me that's either overly cynical or an indication that something is rotting in academic incentive structures.
3 replies →
These reproducible instructions you speak of are already present in Dockerfiles.
It seems like you're arguing against using docker images, when docker builds solve the very issue you speak of.
Correct me if I'm wrong...?
6 replies →
Maybe they should switch to Github. https://github.com/features/packages
Or store the containers in the Internet Archive alongside the paper. They’re just tarballs. Lots of options as long as you're comfortable with object storage.
11 replies →
Publishing containers to GitHub might be free but you have to login to GitHub to download the containers from free accounts, significantly hampering end-user usability compared to Docker Hub, particularly if 2FA authentication is enabled on a GitHub account. As mentioned elsewhere Quay.io might be another alternative.
8 replies →
GitHub storage for docker images is very expensive relative to free: I don’t think it’s a viable solution in this case.
They should be using Nix or similar then. The typical Dockerfile is not reproducible.
As long as the Dockerfile is released alongside, this should not be an issue.
I don't see any valid reason why anyone would upload and share a public docker image but not its Dockerfile and therefore do not pull anything from Dockerhub that doesn't also have the Dockerfile on the Dockerhub page.
Dockerfiles are not guaranteed to be reproducible. They can run arbitrary logic which can have arbitrary side-effects. A classic is `wget https://example.com/some-dependency/download/latest.tgz`.
What about when the image that it is based on goes out of date and is pruned too?
1 reply →
Couldn't journals host the images? Or some university affiliated service, let us call it "dockXiv"?
Having the images on dockerhub is more convenient, but as long as the paper says where to find the image this does not seem that bad.
Makes sense. I don't get how Docker could offer so much free hosting in the first place. I know storage is cheap, but not this cheap. Eventually they're going to need to make these rules more stringent.
I also don't get why people put all their stuff on free services like this and expect it to work until eternity.
Come on, if you stop just half a second and think about it, you know it is a stupid idea and you know that one day you will have a problem. You really don't have to be a genius for that. Same goes for all these other kinds of "services" that are bundled together with things that used to be a one-time purchase, like cars, etc.
Oh, I now have a t.v. that can play Netflix and Youtube, but is otherwise not extendible. But what happens in ten years? T.v. still works fine, but Netflix has gone bust and this new video-service won't work. Too bad, gonna buy a new t.v then. I can get really mad about this stupid short-sightedness everybody has these days.
Spoiler alert: one day Github will be gone too.
Enough billion dollar companies put their weight behind Docker that you'd think someone big is already running Docker hub pushing people to use their paid offerings, but that's not the case. Google created Kubernetes which is almost always used along with Docker, but they don't directly invest in Docker, Inc (at least, based on Crunchbase) and run their own container registry at gcr.io. The same goes for Amazon and Azure where their customers are increasingly moving to Docker instead of VMs, yet none of them directly back the company.
Because docker wired in their registry as the basis for everything by design. I think people had to fork docker to get one where you could create your own registry.
I think it's a good idea to NOT be pulling from someone else's image on the internet.
Do you save a copy of every web page you think might be useful later? I have a small archive of things I consider to be "at risk", but there are many things I enjoy that exist only on other people's servers now. I can't keep it all on my own machines forever, so the difficulty is guessing what will disappear and what won't.
1 reply →
The FAQ[0] says pulling an image once every 6 months will prevent it from being purged by resetting the timer.
It doesn't seem like a big deal really. It just means old public images from years ago that haven't been pulled or pushed to will get removed.
[0]: https://www.docker.com/pricing/retentionfaq
This may not be a big deal for small-time projects. But does this mean e.g., the official Node images for older runtime versions could disappear? I recently needed to revive an app that specifically required node:8.9.4-wheezy, pushed 2 years ago. An image that specific and old will quite possibly hit 0 downloads per 6 months in short order, if it hasn't already.
That is a really good point. I wonder if official images will be treated differently.
1 reply →
looks like there will be some bots that pull images on a periodic basis cropping up
Why not just pay this 60$/year? I mean if it's something important then it's worth paying for. If not - there is cheaper storage available when one can archive their containers.
6 replies →
Yep, everybody will have a small scheduled Github Action pulling their image once per month or similar ¯\_(ツ)_/¯
Docker is partly to blame for its own predicament by conflating URIs with URNs. When you give an image reference as `foo/bar`, the implicit actual name is `index.docker.io/foo/bar`.
That means that "which image" is mixed with "where the image is". You can't deal with them separately. Because everyone uses the shorthand, Docker gets absolutely pummeled. Meanwhile in Java-land, private Maven repos are as ordinary as dirt and a well-smoothed path.
It's time for a v3 of the Registry API, to break this accidental nexus and allow purely content-addressed references.
> the implicit actual name is `index.docker.io/foo/bar`
`index.docker.io/foo/bar:latest` to be more exact, which is a URL, but not really a URI if we're being pedantic.
Docker doesn't really provide an interface to address images by URI (which would be more like the SHA), though in practice, tags other than latest should function closer to a URI
The second issue, is they purposefully do not allow you to change the default domain to point to. The only thing you can do it use a pull through a proxy.
They did this on purpose to promote their docker hub
That's fantastic, my main issue with Docker Hub is that there's a ton of unmaintained and out of date images.
Some just pollutes my search result, I don't care that "yes, technically there's an image that does this thing I want, but it's Ubuntu 14.04 and 4 years old".
Even better, it prevents people from using these unmaintained image as a base for new project, which they will do, because many developer don't look at the Dockerfile and actually review the images they use in shipping product.
As a bonus perhaps this will mean that some a the many image of extremely low quality will go away.
I think it's fair, now you can either pay or maintain your images.
You really shouldn't be pulling someones random images off of dockerhub. If I made a POC 4 years ago on some random kubernetes configuration/tutorial that I was testing and I decided to use dockerhub to host its images (as one typically does, and it used to not have private repos) I'm not posting that for you to come consume 4 years later out of the blue in production because you found it randomly via the search.
You also tend to have no idea what's in those images and what context people are creating them under. Sure, a lot of us know to check the dockerfile, github repo, etc but I have images with 10k+ downloads from OSS contributions but as you've said a whole lot of developers just grab whatever looks fitting on there. My biggest dockerhub pull has no dockerfile, no github repo, and is a core network configuration component I put up randomly just for my own testing because no docker image for it existed years ago.
You’re right, but people tend to see Docker Hub as some master registry for quality and official images, even if it never claimed to be such a thing. Reading and understanding the Dockerfile it vital, before deciding to use it in any sort of production environment. The never policy well help clean up Docker Hub.
1 reply →
If I'm reading this correctly, a single pull every <6 months would avoid this. This seems like NBD to me.
Still, I keep my images mirrored on quay.io and I would recommend that to others (disclaimer: I work for Red Hat which acquired quay.io)
If people really think this is a problem, they'd contribute a non-abusive solution. Writing cron jobs to pull periodically in order to artificially reset the timer is abusive.
Non-abusive solutions include:
- extending docker to introduce reproducible image builds
- extending docker push and pull to allow discovery from different sources that use different protocols like IPFS, TahoeLAFS, or filesharing hosts
I'm sure you can come up with more solutions that don't abuse the goodwill of people.
If their business model isn't working for them, it's their job to fix it in a way that does. I don't see how you can put the responsibility on users. If you say to your users their data will be deleted if they don't pull it at least once per N months, well that's exactly what they will do, and they are perfectly in their right to automate that process.
>- extending docker to introduce reproducible image builds
It's already reproducible... sort of. All you need is to eliminate any outside variables that can affect the build process. This mainly takes the form of network access (eg. to run npm install, for instance).
docker pull and push integrated with IPFS is a great idea!
IPFS is only a partial solution, if you are the only one to have a copy and you pull the plug, content is gone. You would need a bot that takes care of maintained at least 3 or 5 copies always available on the whole file system.
This seems like a non-issue, if you need to keep a rarely-used image alive for some reason just write a cron job to pull it once every six months. If the goal is long term archival it should be entrusted to something like Internet Archive.
This is fine and completely fair, I bet is not cheap paying for storage for docker images no one cares about.
Does anyone know what is docker's business model these days?
After they sold Docker Enterprise to Mirantis, I don't know anymore.
Probably hold on long enough to get acquired?
Why would anyone want to acquire them after they sold off the majority of their client base?
Did they retain a significant amount of talent?
1 reply →
Continuously throw stuff at the wall to see what sticks?
“Enterprise”
https://quay.io/plans/
""" Can I use Quay for free? Yes! We offer unlimited storage and serving of public repositories. We strongly believe in the open source community and will do what we can to help! """
That includes public images? That'll hurt OSS. That's a bummer.
It wouldn't surprise me if people move to Github's registry for open source projects. https://github.com/features/packages
GitLab also has container registry (on GitLab.com it's a part of your 10GB repository budget)
https://docs.gitlab.com/ee/user/packages/container_registry/
Only if they never got pulled for 6 months
If it's oss there should be a docker file available to build yourself?
Probably not at 50 cents per GB of data transfer outside of Github Actions. Unfortunately the only place you can viably use Github registry right now is inside Github actions
That pricing is for private repos. It's free for public repos.
>It wouldn't surprise me if people move to Github's registry for open source projects. https://github.com/features/packages
The egress pricing is going to be a dealbreaker. The free plan only includes 1GB out.
I would delete my own images to clear up room on dockerhub, but they dont have an api to remove images. the only way is to manually click the x in the UI. So, in a lot of ways, docker forced us to "abuse" their service and store thousands of images on a free/open source account. I get this change, and it was inevitable. But its still ironic that you cant delete tour own images. The best way to delete your image is to just stop using it and let docker delete it for you in 6 months.
I agree, I have a little tool called `php-version-audit` that literally becomes useless after a few weeks without an update (you can't audit your php version without the knowledge of the latest CVEs). I have manually cleaned up old images like you say by clicking through them all, but having a way to define retention limits is a feature to me.
Been wondering when image space would start to be a concern.
Actually just set up my own private registry and pull though registry.
Pretty easy stuff although no real GUI to browse as of yet.
This is all sitting on my NAS running in Rancher OS
Not sure if it is production ready, but some of Cisco's containers team have been working on an OCI compliant image repository server.
https://github.com/anuvu/zot
Take a look at Portus, a project maintained by SUSE, which has a pretty nice GUI for a private docker registry.
https://github.com/SUSE/Portus
Actually spent some time looking at this today. It’s a bit more complex than I was hopping. As right now I’m the only user. And the only way to conecto to my registry atm is through wireguard.
It is cool seeing opensuse. Same the rancher
1 reply →
This looks fantastic, thanks for posting it!
Seems like companies are relearning what they should have in the 2001 dotcom bust.
Keep free stuff free and add paid stuff. If your free stuff isn't sustainable, you really should have though that through early on.
This limit seems reasonable, because storage costs are expensive. But it should have been implemented day one so people have reasonable expectations on retention. Other's have mentioned open source projects and artifacts for scientific publication being two niche use cases where people still might want this data years later, but it'd be rare for it to be pulled every six months.
I only have a few things on docker hub, but I'll probably move them to a self-hosted repo pretty soon. At least if it's self hosted, I know it will stay up until I die and my credit cards stop working.
I think in Docker's case, in their original plan this free unlimited hosting was probably sustainable in a freemium model where businesses paid for Docker Enterprise and Docker.com was about marketing and user acquisition, similar to open source on GitHub.com being marketing and user acquisition for paid accounts/Github Enterprise.
Its not an unreasonable strategy to provide generous free hosting if you derive some other business benefit from it (YouTube being another example).
But Docker Inc. found their moat was not that deep and other projects from the big cloud providers killed the market they saw for Docker Enterprise and they sold it off.
So now they just have docker.com and Docker CE - which even that has alternatives now with other runtimes existing. So they need to make docker.com a profitable business on its own or find something else to do which changes the equation significantly.
If you've never used Singularity containers[0], I highly recommend checking them out. They make a very different set of tradeoffs compared to Docker, which can be a nice fit for some uses. Here's a few of my favorites:
* Images are just files. You can copy them around (or archive them) like any other file. Docker's layer system is cool but brings a lot of complexity with it.
* You can build them from Docker images (it'll even pull them directly from Dockerhub).
* Containers are immutable by default.
* No daemon. The runtime is just an executable.
* No elevated permissions needed for running.
* Easy to pipe stdin/stout through a container like any other executable.
[0]: https://github.com/hpcng/singularity
> * Images are just files. You can copy them around (or archive them) like any other file
Never heard of Singularity before, and it does look interesting. Wanted to point out though that you can create tarballs of Docker images, copy them around, and load them into a Docker instance. This is really common for air-gapped deployments.
I've never seen this mentioned in the official Docker docs. Is it a well-supported workflow?
2 replies →
If they are doing this, they should add stats on when the last time an image was pulled, so you can see what is at risk of being removed. Would be curious about a graph, like NPM has a weekly downloads one so you can see how active something is.
(I work for Docker). We will be updating the UI to show status of each image (active or inactive). We will be updating the FAQ shortly to clarify this.
Docker is the new Heroku. Cronjobs will pull images to simulate image activity
In case anyone decides to self-host their docker registry, Pluralsight has a nice course [0] on that subject.
[0] https://www.pluralsight.com/courses/implementing-self-hosted...
> What is an “inactive” image?
> An inactive image is a container image that has not been either pushed or pulled from the image repository in 6 or months.
>> How can I view the status of my images
> All images in your Docker Hub repository have a “Last pushed” date and can easily be accessed in the Repositories view when logged into your account. A new dashboard will also be available in Docker Hub that offers the ability to view the status of all of your container images.
That still does not tell the whole story, does it? I still don't know if my image have been pulled for the last six months. Only when I pushed it.
So this means that open source projects need to pay to keep older images alive?
They say image not tag, so it wouldn't appear this should impact active projects
If no one pulled an image in 6 months it's probably not such a big deal for a project? And if it's open source you could still push a container yourself if you want.
I just got an email notification and while I can understand that they're doing this (all those GB must add up to a significant cost), the relatively short notice seems unnecessary.
6 month notice doesn't seem terrible for a free service imo.
3 month notice - they will start on Nov 1st. Still not bad.
1 reply →
Hmm, the very nature of layered images presumably means big storage savings; I wonder if block-level deduplication at the repository backend would be feasible too?
Registries already do this
Do you mean at the filesystem level, or higher up? Have you got any sources for this?
5 replies →
Remember that time you were looking for an answer to some obscure question, you find the perfect google result - description, page title and URL all indicate it’s going to answer your question so you click it, and... nothing.. the page cannot be found.
You now have that, with docker.
A 2019 paper says there's 47TB of Docker images on the Hub. Get scraping.
I wonder what kind of account the Home Assistant images are using. This could break a whole lot of stuff - and I've seen projects that don't publish a Dockerfile anywhere.
Relying on goodwill works until that goodwill stops. Store your images locally at least as a backup, but it has other advantages.
Is there a way to see when an image was last pulled? I can see the last push date, but not pull.
This will begin November 1, 2020
given their track record with developers over the years, i wouldn't be surprised if microsoft scammbles to build a competitor to docker repo service and integrates it with github.
The complaints as expected, are the usual ones from free generation, apparently Mozilla is not enough.
Let's mirror dockerhub on a distributed fault tolerant file system. And IPFS sucks.
tl;dr images hosted on free accounts without downloads for 6 months will be (scheduled) removed.
Time for someone to create a new service that will pull your images into /dev/null once a month.
This could also be solved by one person running a service that would crawl all public docker images and pull those that are close to expiration automatically every 6 months. At this moment I'm just curious how much resources would be needed for that
If Docker cared enough to implement this policy, then why wouldn't they just modify it enough to delete images being protected in such a fashion?
If you just need to send the request, not read the content in full, that can be done by one free-tier cloud VM.
6 replies →
No need for a new service, a simple Github Action with a cron trigger can do it.
A community of bots.
I'm selling a SAAS service that will pull each of your images once every 6 months...thank you Docker.
I'll do it for free
Add one more coin to the "always self host" bucket. Just another example of a service that starts free, then they pull the rug from under you and hold you hostage for their ransom.