Comment by hedora
1 day ago
People screw up the bcrypt thing all the time. Pick a single threaded server stack (and run on one core, because Kubernetes), then configure bcrypt so brute forcing 8 character passwords is slow on an A100. Configure kubernetes to run on a medium range CPU because you have no load. Finally, leave your cloud provider's HTTP proxy's timeout set to default.
The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.
But at least the architecture I described "scales".
Fond memories of a job circa 2013 on a very large Rails app where CI times were sped up by a factor of 10 when someone realized bcrypt was misconfigured when running tests and slowing things down every time a user was created through a factory.
"because Kubernetes"? Is this assuming that you're running your server inside of a Kubernetes instance (and if so, is Kubernetes going to have problems with more than one thread?), or is there some other reason why it comes into this?