Comment by dedpool

7 days ago

This one feels refreshing. It’s written in Go, and the TUI is pretty slick. I’ve been running Qwen Coder 3 on a GPU cluster with 2 B200s at $2 per hour, getting 320k context windows and burning through millions of tokens without paying closed labs for API calls.

how many tk/sec are you getting on that setup especially when you have 100k+ tokens?