Comment by ilaksh

2 months ago

Anyone providing hosted inference for 9B? I'm just trying to save the operational effort of renting a GPU since this is a business use case that doesn't have real GPUs available right now. I don't see the small ones on OpenRouter. Maybe there will be a runpod serverless or normal pod template or something.

Also does 9b or 9b 8 bit or 6bit run with very low latency on a 4090?

1 comment

ilaksh

mongrelion 2 months ago

By anyone do you mean a well-established business or any entity willing to serve you?