Comment by gxs

16 hours ago

This is gross

It feels like we’ve been in the golden age and the window is coming to a close

Let the enshitification begin, I guess

25 comments

gxs

How do you expect the spend & COGS for free LLM inference to be funded? For users who don't want to pay, or maybe can't pay?

derektank 15 hours ago

Perhaps it’s a glib and easy thing to say, but after a teaser period, I would simply not offer free LLM inference. Agreeing to serve ads just completely re-aligns your interests away from providing the best possible user experience to something else entirely.
infinite_spin 15 hours ago
From things like defense/private contracts
e.g. colleges pay for institutional subscriptions
- 2ndorderthought 15 hours ago
  
  The average person doesn't benefit from defense contracts ... Like ever.
  
  2 replies →

iammrpayments 15 hours ago

It has begun ever since they nerfed chatgpt4 before releasing 4o

2ndorderthought 16 hours ago

In the past month local models have been ramping up in major way meanwhile the namesake providers have upped prices, went offline randomly, and started doing slimier and slimier things.

I really think the future is local compute. Or at least self hosted models.

SchemaLoad 16 hours ago
The hosted ones still have the advantage of being able to search the internet for live info rather than being limited to a knowledge cut off date.
- gbear605 16 hours ago
  
  I’m not sure why a model needs to be hosted in order to make network calls?
  
  7 replies →
- chrisweekly 14 hours ago
  
  That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.
- darepublic 16 hours ago
  
  Local ones that support tool use can do the same
- eightysixfour 16 hours ago
  
  You can do that locally too!
CSMastermind 16 hours ago
What's the rough equivalent of a local model? Are we talking GPT-4?
- 2ndorderthought 15 hours ago
  
  Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center. https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_...
  Then there are middle size ones which require multiple gpus which are like gpts latest flagships.
  Then there is kimi 2.6 which is a monster that is beating opus in some benchmarks. https://www.reddit.com/r/LocalLLaMA/comments/1sr8p49/kimi_k2...
  It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment
- Terretta 16 hours ago
  
  Depends on your VRAM or "unified" memory for how smart it is, and CPU/GPU for how quick it is.
  128GB of RAM? Sure, the early to mid 4s releases, except maybe 4o. And on an M5 Max, about the same speed.
  I wouldn't really bother under 64GB (meaning 32GB or less) except for entertainment value (chats, summaries, tasky read-only agent things).
- kay_o 16 hours ago
  
  GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style.

rnxrx 16 hours ago

The arc of the technological universe is short, but it bends toward enshitification.