Comment by onlyrealcuzzo

1 day ago

It's not unreasonable to think that with improvements on the software side - a Saturn-like model based on diffusion could be this powerful within a decade - with 1s responses.

I'd highly doubt in 10 years, people are waiting 30m for answers of this quality - either due to the software side, the hardware side, and/or scaling.

It's possible in 10 years, the cost you pay is still comparable, but I doubt the time will be 30m.

It's also possible that there's still top-tier models like this that use absurd amounts of resources (by today's standards) and take 30m - but they'd likely be at a much higher quality than today's.

The pressure in the other direction is tool use. The more a model wants to call out to a series of tools, the more the delay will be, just because it the serial process isn't part of the model.