← Back to context

Comment by pxc

16 hours ago

This feels way less annoying to use than ChatGPT. But I wonder how much the effect is lost when the tool does many of the things that make models like o3 useful (repeated web searches, running code in a sandbox, etc.).

For code generation, this does seem pretty useful with something like Qwen3-Coder-480B, if that generates good enough code for your purposes.

But for chat, I wonder: does this kind of speed call for models that behave pretty differently to current ones? With virtually instant speed, I find myself wanting much shorter answers sometimes. Maybe a model whose design and training are focused on concision and a context with lots and lots of turns would be a uniquely useful option with this kind of hardware.

But I guess the hardware is really for training, right, and the inference-as-a-service stuff is basically a powerful form of marketing?