Comment by cyberax

1 month ago

I solved it by adding a simple Tailscale action to handle failure. It creates an ephemeral instance and waits for connections for 3 minutes. Then it loops while there's an active SSH session present.

It's that simple: https://gist.github.com/Cyberax/9edbde51380bf7e1b298245464a2... and it saved me _hours_ of debug time.

I've moved all my CI/CD to use Taskfiles inside a Docker container since then, so my local environment can replicate the CI/CD environment up to the GITHUB_TOKEN. Still, being able to poke around Github builders is great.

4 comments

cyberax

lioeters 1 month ago

That looks like a useful trick, using an ephemeral instance to SSH into a failed CI action context. I see in the script how it waits and checks for root user login, but to keep it alive, this part:

> Then it loops while there's an active SSH session present.

From what I can see, the loop stops when a user is logged in. Is this handled elsewhere?

> use Taskfiles inside a Docker container since then, so my local environment can replicate the CI/CD environment

Oh this is what I've been wanting, a vendor-neutral way to run the same CI actions locally. I'd seen go-task before, will try it, thanks for the info!

cyberax 1 month ago
> That looks like a useful trick, using an ephemeral instance to SSH into a failed CI action context.
Yup. And Tailscale even manages the SSH key provisioning.
> From what I can see, the loop stops when a user is logged in. Is this handled elsewhere?
The script does handle it. The `pgrep` succeeds (returns zero exit code) if there's a "login" process for user 'root' present, which is created when there's an active SSH session. If pgrep fails, then `break` runs and exits the loop.
Github then terminates the workflow and releases the runner.
- lioeters 1 month ago
  
  Ah I see what you mean, the loop keeps it alive until login is detected, and after that the machine is kept alive by the SSH session itself. Appreciated.

rurban 1 month ago

You also got the Tesla keys, nice!