Comment by jrm4

1 month ago

Aha. This, ideally, is a job for local only. Ollama et al.

Now, of course, it is in question as to whether my little graphics card can reasonably compare to a bigger cloud thing (and for me presently a very genuine question) but that really should be the gold standard here.

3 comments

jrm4

wolvoleo 25 days ago

I have a hybrid model here. For many many tasks a local 12b or similar works totally fine. For the rest I use cloud, those things tend to be less privacy sensitive anyway.

Like when someone sends me a message, I made something that categorises it for urgency. If I'd use cloud it means they get a copy of all those messages. But locally there's no issue and complexity wise it's pretty low for an LLM.

Things like research jobs I do do in cloud, but they don't really contain any personal content, they just research using sources they already have access to anyway. Same with programming, there's nothing really sensitive in there.

jrm4 25 days ago
Nice. You're exactly nailing what I'm working towards already. I'm programming with gemini for now and have no problem there, but the home use case I found for local Ollama was "taking a billion old bookmarks and tagging them." Am looking forward to pointing ollama at more personal stuff.
- wolvoleo 25 days ago
  
  Yeah I have two servers now. One with a big AMD for decent LLM performance. And one for a smaller Nvidia that runs mostly Whisper and some small models for side tasks.