Comment by dedpool

6 months ago

This one feels refreshing. It’s written in Go, and the TUI is pretty slick. I’ve been running Qwen Coder 3 on a GPU cluster with 2 B200s at $2 per hour, getting 320k context windows and burning through millions of tokens without paying closed labs for API calls.

3 comments

dedpool

yahoozoo 6 months ago

Are you using a service for the GPU cluster?

beacon294 6 months ago

I'd like to try this out, are you renting on one of the open renting platforms?

segmondy 6 months ago

how many tk/sec are you getting on that setup especially when you have 100k+ tokens?