Comment by throwa356262
1 day ago
I ran Gemma on a 2015 thinkpad to do something similar. Fortunately, I could upgrade the memory otherwise it would have been a painful exercise.
Not gonna lie, llama.cpp had the fans spinning at max speed. But it worked and I got the job done.
> the fans spinning at max speed
This always confuses me - don't people want their computations to run as fast as possible and thus inevitably produce more heat that needs to be vented?
I suppose sometimes it is just an analogy for "its utilizing 100% of my resources" (which I'm guessing it is here), but I've definitely had people say it as an actual complaint in different contexts
> I've definitely had people say it as an actual complaint in different contexts
I think fan loudness is an outgrowth of conspicuous consumption because a certain OEM decided to make it a marketing bullet-point.
I was equally disappointed by by people - especially device reviewers - banging on the drum that phones made of plastic "didn't feel premium", and we got phones with glass backs that have to be shoved into plastic cases (because plastic is the near-perfect material to protect fragile phones screens and innards)
What people complain is when they visit a blog with two images and the fans are spinning at max speed because the blog has 100 trackers.
Fans shouldn't be running at max speed if the model fits in RAM with room to spare for context. Usually fans max out when the model doesn't fit and the CPU is chugging to make up the difference (or the user didn't tune LLM settings)
I didn't realize GPUs operate with zero heat output.