Comment by dedpool
7 days ago
This one feels refreshing. It’s written in Go, and the TUI is pretty slick. I’ve been running Qwen Coder 3 on a GPU cluster with 2 B200s at $2 per hour, getting 320k context windows and burning through millions of tokens without paying closed labs for API calls.
Are you using a service for the GPU cluster?
I'd like to try this out, are you renting on one of the open renting platforms?
how many tk/sec are you getting on that setup especially when you have 100k+ tokens?