Comment by seanmcdirmid
1 day ago
Running an LLM locally means you never have to worry about how many tokens you've used, and also it allows for a lot of low latency interactions on smaller models that can run quickly.
I don't see why consumer hardware won't evolve to run more LLMs locally. It is a nice goal to strive for, which consumer hardware makers have been missing for a decade now. It is definitely achievable, especially if you just care about inference.
isnt this what all these NPUs are created for?
I haven’t seen an NPU that can compete with a GPU yet. Maybe for really small models, I’m still not sure where they are going with those.