Comment by magundu
10 days ago
Our use case is to execute test scripts in a sandbox mode. This is multi host and multi region setup. We might run millions of test scripts per day.
One of our engineers found https://testcontainers.com. We find it interesting and it seems like it won’t maintain container live. Instead, it start and remove the container for each test. We might need to implement lock mechanism to run only maximum number of containers at a time. I don’t know whether it fits for highly scalable test cases.
That’s a super exciting use case — running millions of test scripts across a multi-host, multi-region setup is no small feat. You're spot on about Testcontainers — it's elegant for one-off, isolated runs (like in CI), but when you're pushing at scale, the overhead of spinning up and tearing down containers for every single test can start to hurt. In high-throughput environments, most scalable setups I’ve seen shift towards a pre-warmed pool of sandbox containers — essentially keeping a fleet of "hot" containers alive and routing tasks into them via docker exec. You lose a bit of isolation granularity but gain massively in performance. You could even layer in a custom scheduler (Redis- or NATS-backed maybe?) that tracks container load and availability across hosts. Pair that with a smart TTL+health checker, and you can recycle containers efficiently without zombie buildup. Also — curious if you've explored running lighter-weight VMs (like Firecracker or Kata Containers) instead of full Docker containers? They can offer tighter isolation with decent spin-up times, and could be a better fit for multi-tenant test runs at this scale. Would love to nerd out more on this — are you planning to open source anything from your infra? Or even just blog about the journey? I feel like this would resonate with a lot of folks in the testing/devops space.