Comment by wolvoleo
1 day ago
I have a hybrid model here. For many many tasks a local 12b or similar works totally fine. For the rest I use cloud, those things tend to be less privacy sensitive anyway.
Like when someone sends me a message, I made something that categorises it for urgency. If I'd use cloud it means they get a copy of all those messages. But locally there's no issue and complexity wise it's pretty low for an LLM.
Things like research jobs I do do in cloud, but they don't really contain any personal content, they just research using sources they already have access to anyway. Same with programming, there's nothing really sensitive in there.
Nice. You're exactly nailing what I'm working towards already. I'm programming with gemini for now and have no problem there, but the home use case I found for local Ollama was "taking a billion old bookmarks and tagging them." Am looking forward to pointing ollama at more personal stuff.
Yeah I have two servers now. One with a big AMD for decent LLM performance. And one for a smaller Nvidia that runs mostly Whisper and some small models for side tasks.