Comment by jrm4

1 day ago

Aha. This, ideally, is a job for local only. Ollama et al.

Now, of course, it is in question as to whether my little graphics card can reasonably compare to a bigger cloud thing (and for me presently a very genuine question) but that really should be the gold standard here.

I have a hybrid model here. For many many tasks a local 12b or similar works totally fine. For the rest I use cloud, those things tend to be less privacy sensitive anyway.

Like when someone sends me a message, I made something that categorises it for urgency. If I'd use cloud it means they get a copy of all those messages. But locally there's no issue and complexity wise it's pretty low for an LLM.

Things like research jobs I do do in cloud, but they don't really contain any personal content, they just research using sources they already have access to anyway. Same with programming, there's nothing really sensitive in there.

  • Nice. You're exactly nailing what I'm working towards already. I'm programming with gemini for now and have no problem there, but the home use case I found for local Ollama was "taking a billion old bookmarks and tagging them." Am looking forward to pointing ollama at more personal stuff.

    • Yeah I have two servers now. One with a big AMD for decent LLM performance. And one for a smaller Nvidia that runs mostly Whisper and some small models for side tasks.