← Back to context

Comment by znpy

6 years ago

For everybody complaining about having to pay actual money for goods and services: if you're not okay with this you can run a self hosted registry.

The out of the box registry does very little and has a very poor user experience.

But nowadays there are Harbor from VMware and Quay from RedHat that are open source and easily self-hostable.

We run our own Harbor instance at work and I can tell you... Docker images are NOT light. You think they are, they are not. It's easy to have images proliferate a lot and burn a lot of disk space. Under some conditions when layers are shared among too many images (can't recall the exact details here) deleting an image may result in also deleting a lot more images (and this is not the correct/expected/wanted behaviour) and that means that under some circumstances you have to retain a lot more images or layers than you think you should.

The thing is, I can only wonder how much bandwidth and disk space (oh and disk space must be replicated for fault tolerance) must cost running a public registry for everybody.

It hurts the open source ecosystem a bit, I understand... Maybe some middle ground will be reached, dunno.

Edit: I also run harbor at home, it's not hard to setup and operate, you should really check that out.

I thought they didn't make private registries available because it would "fragment the ecosystem"

Here's the conversation I saw:

https://stackoverflow.com/questions/33054369/how-to-change-t...

pointing to this:

https://github.com/moby/moby/issues/7203

and also there was this comment:

"It turns out this is actually possible, but not using the genuine Docker CE or EE version.

You can either use Red Hat's fork of docker with the '--add-registry' flag or you can build docker from source yourself with registry/config.go modified to use your own hard-coded default registry namespace/index."

  • No you have misunderstood the issue. You can use any registry, just write out the domain for it, this has always worked and is very widely used. Red Hat changed the default if you don't specify a FQDN, before they decided not to ship Docker at all.

    • I understand that point, but it makes it harder to not "accidentally" pull from a public registry with intertwined docker images (which most people use)

> The out of the box registry does very little and has a very poor user experience.

I think this is indirectly what people are complaining about. Having a free registry mitigates that. So they aren't far off track.

It's true we shouldn't be bitter about Docker. They did a lot to improve the development ecosystem. We should try to avoid picking technologies in the future that aren't scalable in both directions though.

For example, PostgreSQL works well for a 1GB VPS containing it and the web server for dozens of users, and it also works well for big sites. With MongoDB the VPS doesn't work so well.

Yep, docker registry is absolute garbage and lack garbage collection.

FYI, Gitlab (free version) has a built in registry as well and it let's you define retention rules.

  • I know about that, and we used to use that but had to move away from it because it created a lot of scalability problems (mind you, most of them due to disk usage).

    With harbor we can save docker images layers to switft, the openstack flavor of object storage (S3). That solves a lot of scalability problems.

    AFAIK gitlab ships the docker Registry underneath so the problems stay, mostly. I think that harbor does the same. I skimmed the harbor source and it seems that it forwards http requests to the docker registry if you hit the registry API endpoints.

    Haven't looked at Quay but as far as I know wherever there's the docker registry you'll have garbage collection problems.

    One note on the side: I think that quay missed their chance to become the goto docker registry. Red Hat basically open sourced it after harbor had been incubated in the cncf (unsurprisingly, harbor development has skyrocketed after that event).

    • (Co-founder of Quay here)

      Scalability and garbage collection are actually two of the main areas of focus Quay has had since its inception. As you mentioned, most modern Docker registries such as Quay and Harbor will automatically redirect to blob storage for downloading of layers to help with scale; Quay actually goes one step further and (for blobs recently pulled) will skip the database entirely if the information has been cached. Further, being itself a horizontally scalable containerized app, Quay can easily be scaled out to handle thousands of requests per second (which is a very rough estimate of the scale of Quay.io)

      On the garbage collection side, Quay has had fully asynchronous background collection of unreferenced image data since one of its early versions. That, plus the ability to label tags with future expiration, means you can (reasonably) control the growth problem around images. Going forward, there are plans to add additional capabilities around image retention to help mitigate further.

      In reference to your note: We are always looking for contributors to Project Quay, and we are starting a bug bash with t-shirts as prizes for those who contribute! [1]

      [1] https://github.com/quay/quay#quay-bug-bash

      Edit: I saw the edit below and realized the bug bash is listed as ending tomorrow; we're extending it another month as we speak!

      2 replies →

It always blows my mind when people complain about free services, and products are no longer going to be free. I learned that the free tier based customers, are the worst to support when I built Amezmo. Just like with the MailGun pricing changes, we'll have the people complaining about how it's no longer free.

Which IaaS do you use to selfhost for the one at work? How much does the network transfer cost you? Or are they docker pulls internal network?

  • We run on openstack, managed by a local yet fairly large openstack provider (Irideos, their devops/consulting team is top notch and has helped us adopt many cloud-native technologies while still staying fairly vendor-agnostic).

    I can't see the bills, but we never worry about bandwidth usage and I am fairly sure that bandwidth is free, basically.

    Keep in mind that since we run our own harbor instance, most of the image pulls happen within our openstack network, so that does/would not count against bandwidth usage (but image storage does). In terms of bandwidth thus, we can happily set "always" as imagePullPolicy in our kubernetes clusters.

    Edit: openstack works remarkably well. The horizon web interface is slow as molasses but thanks to terraform we rarely have to use it.

Harbor is excellent, especially now that you can set up automatic image pruning rules that make sense.