← Back to context

Comment by cyberax

1 month ago

I solved it by adding a simple Tailscale action to handle failure. It creates an ephemeral instance and waits for connections for 3 minutes. Then it loops while there's an active SSH session present.

It's that simple: https://gist.github.com/Cyberax/9edbde51380bf7e1b298245464a2... and it saved me _hours_ of debug time.

I've moved all my CI/CD to use Taskfiles inside a Docker container since then, so my local environment can replicate the CI/CD environment up to the GITHUB_TOKEN. Still, being able to poke around Github builders is great.

That looks like a useful trick, using an ephemeral instance to SSH into a failed CI action context. I see in the script how it waits and checks for root user login, but to keep it alive, this part:

> Then it loops while there's an active SSH session present.

From what I can see, the loop stops when a user is logged in. Is this handled elsewhere?

> use Taskfiles inside a Docker container since then, so my local environment can replicate the CI/CD environment

Oh this is what I've been wanting, a vendor-neutral way to run the same CI actions locally. I'd seen go-task before, will try it, thanks for the info!

  • > That looks like a useful trick, using an ephemeral instance to SSH into a failed CI action context.

    Yup. And Tailscale even manages the SSH key provisioning.

    > From what I can see, the loop stops when a user is logged in. Is this handled elsewhere?

    The script does handle it. The `pgrep` succeeds (returns zero exit code) if there's a "login" process for user 'root' present, which is created when there's an active SSH session. If pgrep fails, then `break` runs and exits the loop.

    Github then terminates the workflow and releases the runner.

    • Ah I see what you mean, the loop keeps it alive until login is detected, and after that the machine is kept alive by the SSH session itself. Appreciated.