← Back to context

Comment by sshine

8 days ago

> headscale in production at work

  - How much effort do you put into key management compared to plain WireGuard?
  - How automated is the onboarding process; do you generate and hand over keys?
  - How do you cope without the commercial Tailscale dashboard?
  - Do you run some kind of dashboard or metrics system?
  - How long did it take to set up?
  - Were there any gotchas?

> - How much effort do you put into key management compared to plain WireGuard?

Less effort than plain wireguard; the only key management I do is for non-human clients

> - How automated is the onboarding process; do you generate and hand over keys?

Fully automated. Auth is done via OIDC to my company's SSO provider, so users can enroll their own machines without IT involvement.

> - How do you cope without the commercial Tailscale dashboard?

I don't really miss it. The headscale CLI tool is pretty good, and I use one of the headscale web UI projects (three are several: https://headscale.net/stable/ref/integration/web-ui/?h=web) for quick access to a few features (https://github.com/gurucomputing/headscale-ui)

> - Do you run some kind of dashboard or metrics system?

Yes, I scrape headscale's Prometheus metrics endpoint and have put together a simple Grafana dashboard. The metrics it emits are somewhat limited, but enough to keep an eye on its health.

> - How long did it take to set up?

I had a prototype up and running on Kubernetes with OIDC integration and a web UI in about 1 day of hacking. Going into full production took a few months, but the majority of that time was about planning the migration of all the existing users from OpenVPN.

Come to think of it, maybe I should share my terraform modules for deploying it.

> - Were there any gotchas?

A few, yeah:

- Setting up mobile clients is a bit fiddly, because they hide the "connect to a non-default control plane URL" under a debug menu. The mac and windows apps are similar - it's too easy for users to accidentally try to connect to tailscale.com instead of your headscale instance. If you have the ability to deploy MDM profiles (mac) or windows registry tweaks this is easy to fix, and the headscale server will even generate the configs for you.

- The headscale control plane doesn't support any kind of HA or replication. This doesn't disqualify it since tailscale can handle brief control plane outages without breaking the network, but it's likely to be a concern for serious enterprise users. It's possible to use an external Postgres database, so you can at least replicate data that way, but only one headscale server replica can be active at a time because they don't share runtime state.

- The tailscale API is not fully implemented, so you can't use things like the tailscale Kubernetes operator.

- Some features are missing: tailscale funnel, tailscale serve, app connectors, `autogroup:self` ACLs, SCIM provisioning, SSO group membership sync, and I forget what else. These may or may not be important to you.

For app connectors, I wrote an app to emulate the core functionality: https://github.com/singlestore-labs/tailscale-manager (it's in Haskell, but deployers don't need to care about that)

It's possible to implement group sync with some custom scripting - a python app to scrape your LDAP (or whatever) and generate tailscale ACLs isn't hard to write. But you do have to write it.

`autogroup:self` might be a big deal - you would need this if you want to stop users from seeing or connecting directly to each other's devices. I think there is an implementation of this coming in the next release of headscale.

Summary: headscale is great if you have relatively simple needs and can't afford to pay for Tailscale. You will probably outgrow it if you're running a serious business and need to comply with fancy audit requirements.