Comment by mark_l_watson
1 day ago
Fine, I guess. The only commercial API I use to any great extent is gemini-3-flash-preview: cheap, fast, great for tool use and with agentic libraries. The 3.1-pro-preview is great, I suppose, for people who need it.
Off topic, but I like to run small models on my own hardware, and some small models are now very good for tool use and with agentic libraries - it just takes a little more work to get good results.
Seconded. Gemini used to be trash and I used Claude and Codex a lot but gemini-3-flash-preview punches above it's weight, it's decent and I rarely if ever run into any token limit either.
Thirded, I've been using gemini-3-flash to great effect. Anytime I have something more complicated, I give it to pro & flash to see what happens. Coin flip if flash is nearly equivalent (too many moving vars to be analytical at this point)
What models are you running locally? Just curious.
I am mostly restricted to 7-9B. I still like ancient early llama because its pretty unrestricted without having to use an abliteration.
I experimented with many models on my 16G and 32G Macs. For less memory, qwen3:4b is good, for the 32B Mac, gpt-oss:20b is good. I like the smaller Mistral models like mistral:v0.3 and rnj-1:latest is a pretty good small reasoning model.
I like to ask claude how to prompt smaller models for the given task. With one prompt it was able to make a low quantized model call multiple functions via json.