Comment by BoorishBears

5 days ago

No, I've deployed a lot of open weight models and the gap between closed source is there even at larger sizes.

I'm running a 400B parameter model at FP8 and it still took a lot of post-training to get an even somewhat comparable performance

I think a lot of people implicitly bake in some grace because the models are open weights, and that's not unreasonable because of the flexibility... but in terms of raw performance it's not even close.

GPT-3.5 has better world knowledge than some 70B models, and a few even larger.

10 comments

BoorishBears

laardaninst 4 days ago

The big "frontier" models are expert systems built on top of the LLM. That's the reason for the massive payouts to scientists. It's not about some ML secret sauce, it's about all the symbolic logic they bring to the table.

Without constantly refreshing the underlying LLM and the expert system layer, these models would be outdated in months. Language and underlying reality would shift from under their representations and they would rot quick.

That's my reasoning for considering this a bubble. There has been zero indication that the R&D can be frozen. They are stuck burning increasing amouts of cash for as long as they want these models to be relevant and useful.

daft_pink 5 days ago

you're killing my dream of blowing $50-100k on a desktop supercomputer next year and being able to do everything locally ;)

"the hacker news dream" - a house, 2 kids, and a desktop supercomputer that can run a 700B model.

meaydinli 5 days ago
Take a look at: https://www.nvidia.com/en-us/products/workstations/dgx-spark... . IIRC, it was about ~$4K.
- PeterStuer 4 days ago
  
  Given that for a non quantized 700B monolithic model with let's say a 1M token context, you would need around 20TB of memory, I doubt your spark or M4 will get very far.
  I'm not saying those machines can't be usefull or fun, but it's not in the range of the 'fantasy' thing you're responding to.
  
  2 replies →
- phonon 5 days ago
  
  An M4 Max twice the memory bandwidth (which is typically the limiting factor)
  
  3 replies →