Comment by libraryofbabel

7 months ago

The basic concept plus a lot of money spent on compute and training data gets you pretraining. After that to get a really good model there’s a lot more fine-tuning / RL steps that companies are pretty secretive about. That is where the “smart decisions” and knowledge gained by training previous generations of sota models comes in.

We’d probably see more companies training their own models if it was cheaper, for sure. Maybe some of them would do very well. But even having a lot of money to throw at this doesn’t guarantee success, e.g. Meta’s Llama 4 was a big disappointment.

That said, it’s not impossible to catch up to close to state-of-the-art, as Deepseek showed.

1 comment

libraryofbabel

ivape 7 months ago

I’d also add that no one predicted the emergent properties of LLMs as they followed the scaling laws hypothesis. GPT showed all kinds of emergent stuff like reasoning/sentiment analysis when we went up an order of magnitude on the number of parameters. We don’t don’t actually know what would emerge if we trained a quadrillion param model. SOTA will always be mysterious until we reach those limits, so, no, companies like Cursor will never be on the frontier. It takes too much money and requires seeking out things we haven’t ever seen before.