Comment by konaraddi

5 hours ago

> applying this compression algorithm at scale may significantly relax the memory bottleneck issue.

I don’t think they’re going to downsize though, I think the big players are just going to use the freed up memory for more workflows or larger models because the big players want to scale up. It’s a cat and mouse race for the best models.

4 comments

konaraddi

miohtama 3 hours ago

It will also help with local inference, making AI without big players possible.

otabdeveloper4 1 hour ago
It's already possible. Post-training is vastly more important than model size. (There's bigtime diminishing returns with increasing model size.)
- plagiarist 15 minutes ago
  
  Is there a size cutoff you would say where diminishing returns really kick in?
  My experience doesn't disagree, at least. I've been using Qwen for coding locally a bit. It is much better than I thought it would be. But also still falls short in some obvious ways compared to the frontiers.

Verdex 4 hours ago

Known in the business as 'pulling a jevons'