← Back to context

Comment by mkagenius

3 days ago

Sooner or later I believe, there will be models which can be deployed locally on your mac and are as good as say Sonnet 4.5. People should shift to completely local at that point. And use sandbox for executing code generated by llm.

Edit: "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure. Unlike gemini then you don't have to rely on certain list of whitelisted domains.

If you read the article you'd notice that running an LLM locally would not fix this vulnerability.

I've been repeating something like 'keep thinking about how we would run this in the DC' at work. The cycles of pushing your compute outside the company and then bringing it back in once the next VP/Director/CTO starts because they need to be seen as doing something, and the thing that was supposed to make our lives easier is now very expensive...

I've worked on multiple large migrations between DCs and cloud providers for this company and the best thing we've ever done is abstract our compute and service use to the lowest common denominator across the cloud providers we use...

Can't find 4.5, but 3.5 Sonnet is apparently about 175 billion parameters. At 8-bit quantization that would fit on a box with 192 gigs of unified RAM.

The most RAM you can currently get in a MacBook is 128 gigs, I think, and that's a pricey machine, but it could run such a model at 4-bit or 5-bit quantization.

As time goes on it only gets cheaper, so yes this is possible.

The question is whether bigger and bigger models will keep getting better. What I'm seeing suggests we will see a plateau, so probably not forever. Eventually affordable endpoint hardware will catch up.

That's not easy to accomplish. Even a "read the docs at URL" is going to download a ton of stuff. You can bury anything into those GETs and POSTs. I don't think that most developers are going to do what I do with my Firefox and uMatrix, that is whitelisting calls. And anyway, how can we trust the whitelisted endpoint of a POST?

> Edit: "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure.

The problem is that people want the agent to be able to do "research" on the fly.

At the time that there's something as good as sonnet 4.5 available locally, the frontier models in datacenters may be far better.

People are always going to want the best models.

Why is the being downvoted?

  • Because the article shows it isn't Gemini that is the issue, it is the tool calling. When Gemini can't get to a file (because it is blocked by .gitignore), it then uses cat to read the contents.

    I've watched this with GPT-OSS as well. If the tool blocks something, it will try other ways until it gets it.

    The LLM "hacks" you.

  • Because it misses the point. The problem is not the model being in a cloud. The problem is that as soon as "untrusted inputs" (i.e. web content) touch your LLM context, you are vulnerable to data exfil. Running the model locally has nothing to do with avoiding this. Nor does "running code in a sandbox", as long as that sandbox can hit http / dns / whatever.

    The main problem is that LLMs share both "control" and "data" channels, and you can't (so far) disambiguate between the two. There are mitigations, but nothing is 100% safe.

    • Sorry, I didn't elaborate. But "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure.

      3 replies →