Comment by adrian_b
1 day ago
The high costs are necessary for high speed.
When a low speed of the order of one token per second is accepted, any open weights LLM can be run on an ordinary PC (with the weights read from SSDs) and the cost becomes negligible.
Such a low speed would be annoying for a chat, but I do not believe that it is "barely useful" for a coding assistant. There are plenty of tasks for which it is fine to get results some hours later or even overnight, and batching multiple tasks can complete them in about the same time as a single task.
I don't know. Even the frontier models do dumb things sometimes. Being able to iterate (and iterate quickly) is really important. If you get 1 try a day, you're probably back to it being better to just code by hand. Also, you're going to get absolutely outpaced by anyone who uses AI that goes faster.
So maybe for a hobby project this is fine, but for something you have to take to market and compete with... I think it'd be a really rough sell.
EDIT: also, just to be clear: if there was a practical path to using local AI, I'd take it in a heartbeat. I hope it gets to the point that it's better to use local than paying someone $200/mo. But right now, that $200/mo is the clear best option. I get making compromises for ideology but the compromises are too big for me right now.