Comment by arjie

1 day ago

That’s the cost of using a new hardware provider. A single RTX Pro 6000 Blackwell Max-Q will do better than that and be much more usable. I have 2 running DS4 Flash at 160 tok/s with max num seqs 4.

Very interesting though, these Tenstorrent chips. Might get one to experiment with.

Yeah that’s definitely the smarter buy if you want to just have models running quickly. But the cost of 2 p150 and a 4090 was <$5000 for me.

The main issue is the immature software, and somewhat baroque way of writing kernels. Please, buy one and join us.

  • Were you able to connect the two P150 using the qsfp-dd cable? They only sell 4x and 8x topologies so I’m curious if that worked for you. Are you able to run them tensor parallel?