Comment by simianwords
3 hours ago
It’s interesting that they kept the price the same while doing inference on Cerebras is much more expensive.
3 hours ago
It’s interesting that they kept the price the same while doing inference on Cerebras is much more expensive.
I dont think this is Cerebras. Running on cerebras would change model behavior a bit and it could potentially get a ~10x speedup and it'd be more expensive. So most likely this is them writing new more optimized kernels for Blackwell series maybe?
Fair point but it remains to answer - why isn’t this speed up available in ChatGPT and only in the api?
this is almost certainly not being done on cerebras