Comment by pornel

5 days ago

The Nano model is 3.2B parameters at 4bit quantization. This is quite small compared to what you get from hosted chatbots, and even compared to open-weights models runnable on desktops.

It's cool to have something like this available locally anyway, but don't expect it to have reasoning capabilities. At this size it's going to be naive and prone to hallucinations. It's going to be more like a natural language regex and a word association game.

The big win for those small local models to me isn't knowledge based (I'll leave that to the large hosted models), but more so a natural language interface that can then dispatch to tool calls and summarize results. I think this is where they have the opportunity to shine. You're totally right that these are going to be awful for knowledge.

The point in these models isn't to have all the knowledge in the world available.

It's to understand enough of language to figure out which tools to call.

"What's my agenda for today" -> get more context

cal = getCalendar() getWeather(user.location()) getTraffic(user.location(), cal[0].location)

etc.

Then grab the return values from those and output:

"You've got a 9am meeting in Foobar, the traffic is normal and it looks like it's going to rain after the meeting."

Not rocket science and not something you'd want to feed to a VC-powered energy-hogging LLM when you can literally run it in your pocket.

  • Isn't this what Apple tried with Siri? I don't see anyone use it, and adding an LLM to the mix is going to make it less accurate.

    • They wrote a whole ass paper about SLMs that do specifically this - expert small language models with narrow expertise.

      And then went for a massive (but private and secure) datacenter instead.

Speculation: I guess the idea is they build an enormous inventory of tool-use capabilities, then this model mostly serves to translate between language and Android's internal equivalent of MCP.

I've had Gemma 3n in edge gallery on my phone for months. It's neat that it works at all but it's not very useful.