Comment by levocardia

4 months ago

There's a Stephen Boyd quote that's something like "if your optimization problem is too computationally expensive, just go on vacation to Greece for a few weeks and by the time you get back, computers might be fast enough to solve it." With LLMs there's sort of an equivalent situation with cost: how mindblowing would it be able to train this kind of LLM at all even just 4 years ago? And today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.

10 comments

levocardia

piokoch 4 months ago

But in this case the reason is simple: the core algorithm is O(n^2), this not going to be improved over a few weeks.

dingnuts 4 months ago

> today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.

And so does ChatGPT

Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI

simonw 4 months ago
Apparently 800 million weekly users are finding ChatGPT useful in its present state.
- infinitezest 4 months ago
  
  1. According to who? Open AI? 2. Its current state is "basically free and containing no ads". I don't think this will remain true given that, as far as I know, the product is very much not making money.
  
  5 replies →
wordpad 4 months ago

Even with linear progression of model capability, the curve for model usefulness could be exponential, especially if we consider model cost which will come down.
For every little bit a model a smarter and more accurate there are exponentially more real world tasks it could be used for.