If your web crawler is using a hundred thousand filehandles, you've got a problem. You shouldn't need that many; You can support ten thousand open web requests, for sure, but you don't need ten filehandles for each; A few hundred connections to intermediate processors and databases where you store the scraped data.
High performance template rendering has as many filehandles as open requests - Maybe 10,000. If it's actually high performance, the templates underneath aren't files anymore by the time you're processing, they're stored in memory.
Databases are almost an exception, but you shouldn't be running "Large DB" on a shared host on K8s. You should taint and dedicate those machines. K8s is still useful as a common management plane, but I'm roughly on the fence of "Just run those machines as a special tier" and "Run them on k8s with dedicated taints", because both have advantages. Smaller databases run just fine. Postgres is using ~10k filehandles
There was a time for a scheduling specifically for "filehandleful" jobs. It's long gone. Modern linux systems set the filehandle limit to something obscene, because it's no longer a limiting factor, and it hasn't been on these workloads for 5 years.
If your web crawler is using a hundred thousand filehandles, you've got a problem. You shouldn't need that many; You can support ten thousand open web requests, for sure, but you don't need ten filehandles for each; A few hundred connections to intermediate processors and databases where you store the scraped data.
High performance template rendering has as many filehandles as open requests - Maybe 10,000. If it's actually high performance, the templates underneath aren't files anymore by the time you're processing, they're stored in memory.
Databases are almost an exception, but you shouldn't be running "Large DB" on a shared host on K8s. You should taint and dedicate those machines. K8s is still useful as a common management plane, but I'm roughly on the fence of "Just run those machines as a special tier" and "Run them on k8s with dedicated taints", because both have advantages. Smaller databases run just fine. Postgres is using ~10k filehandles
There was a time for a scheduling specifically for "filehandleful" jobs. It's long gone. Modern linux systems set the filehandle limit to something obscene, because it's no longer a limiting factor, and it hasn't been on these workloads for 5 years.