Comment by rurban
1 month ago
Local models are not comparable to the FOTA models at all. I know what I'm saying because I do have 4 local H100's in my server, and could run the very best local models. It's night and day. They are unusable and stupid.
For what do you use the 4 local H100s then?
For training our AI model of course. Inference is for the cheaper machines.
Not all tasks require a frontier model
what is "FOTA"
That's my auto-correction, because I'm doing too much embedded (Firmware-over-the-air updates). Frontier it should be called.
I get perfectly acceptable results from a Strix Halo PC the size of a shoebox, man. An APU that uses ~150w, has 0 discrete GPUs, and a bill of $0/m. What's more, it doesn't go down every week, limit use, or change the terms at a whim.
I'll burn/discard 'frontier' tokens (at work) only because they're mandated and they foot the bill. I'd rather resell them; meet the asinine requirement from $EMPLOYER, provide cover for outsourcing to my equipment, and get a return for the hassle.
TLDR: perhaps you're holding it wrong or haven't tried the latest, as we so often hear. That's a lot of GPU for not much utility.
Well, my python and typescript folks are also happy with the simplier local models. But I'm using more advanced stuff, C/C++ embedded real-time, vision AI, and compilers.
Fair point. I treat LLMs like the forgetful junior we often hear about. The things I don't care to do, they (both local and hosted/'frontier') can. Boilerplate, very-well-described edits, some research/report, etc; a lot is riding on 'acceptable'.
Easier to spawn another terminal pane/browser tab than hire a contractor, I just don't find the 'frontier' services/terms compelling.