Comment by siquick
1 day ago
Rent a H100 on Modal which scales down to zero when not in use - you can set the time out period.
Cold boot times are around 5m but if your usage periods are predictable it can work out ok. Works out at $2 an hour.
Still far more expensive than a ChatGPT sub.
Do you have some reference on what setup you're talking about? I'd like to integrate it into my IDE (cursor/vscode) - are there docs on such a setup?
Start here
https://modal.com/docs/examples/vllm_inference
or give this a go
https://modal.com/docs/examples/opencode_server
You get $30 free credits each month on Modal which is enough to play around (i have no affiliation, just think they run a great service)