Comment by 2001zhaozhao

2 months ago

It really is cursed to be spending hundreds of watts of power in a datacenter somewhere to make a laptop run slightly faster.

11 comments

2001zhaozhao

ImPrajyoth 2 months ago

oh absolutely. burning a coal plant to decide if i should close discord is peak 2025 energy. strictly speaking, using the local model (Ollama) is 'free' in terms of watts since my laptop is on anyway, but yeah, if the inefficiency is the art, I'm the artist.

fragmede 2 months ago

Running ollama to compute inference uses energy that wouldn't have been used if you weren't running ollama. There's no free lunch here.
hebejebelus 2 months ago
An interesting thought experiment - a fully local, off-grid, off-network LLM device. Solar or wind or what have you. I suppose the Mac Studio route is a good option here, I think Apple make the most energy efficient high-memory options. Back of the napkin indicates it’s possible, just a high up front cost. Interesting to imagine a somewhat catastrophe-resilient LLM device…
- evilduck 2 months ago
  
  Macs would be the most power efficient with faster memory but an AI Max 395+ based system would probably be the most cost efficient right now. A Framework Desktop with 128GB of shared RAM only pulls 400W (and could be underclocked) and is cheaper by enough that you could buy it plus 400W of solar panels and a decently large battery for less than a Mac Studio with 128GB of RAM. Unfortunately the power efficiency win is more expensive than just buying more power generation and storage ability.
  
  1 reply →
- ImPrajyoth 2 months ago
  
  That is the endgame.
  I think we are moving toward a bilayered compute model: The Cloud: For massive reasoning.
  The Local Edge: A small, resilient model that lives on-device and handles the OS loop, privacy, and immediate context.
  BrainKernel is my attempt to prototype that Local Edge layer. Its messy right now, but I think the OS of 2030 will definitely have a local LLM baked into the kernel.
  
  1 reply →
bdhcuidbebe 2 months ago
> using the local model (Ollama) is 'free' in terms of watts since my laptop is on anyway
Now that’s a cursed take on power efficency
- ImPrajyoth 2 months ago
  
  efficiency is just a mindset. if i save 3 seconds of my own attention by burning 300 watts of gpu, the math works out in my favor!
  
  1 reply →

nubinetwork 2 months ago

An entire datacenter on the other hand, might be appealing to spot things you wouldn't otherwise see in a sea of logs and graphs.