Comment by seanmcdirmid

2 months ago

Running an LLM locally means you never have to worry about how many tokens you've used, and also it allows for a lot of low latency interactions on smaller models that can run quickly.

I don't see why consumer hardware won't evolve to run more LLMs locally. It is a nice goal to strive for, which consumer hardware makers have been missing for a decade now. It is definitely achievable, especially if you just care about inference.