Comment by ilaksh
6 days ago
Anyone providing hosted inference for 9B? I'm just trying to save the operational effort of renting a GPU since this is a business use case that doesn't have real GPUs available right now. I don't see the small ones on OpenRouter. Maybe there will be a runpod serverless or normal pod template or something.
Also does 9b or 9b 8 bit or 6bit run with very low latency on a 4090?
By anyone do you mean a well-established business or any entity willing to serve you?