Comment by mark_l_watson

1 day ago

Fine, I guess. The only commercial API I use to any great extent is gemini-3-flash-preview: cheap, fast, great for tool use and with agentic libraries. The 3.1-pro-preview is great, I suppose, for people who need it.

Off topic, but I like to run small models on my own hardware, and some small models are now very good for tool use and with agentic libraries - it just takes a little more work to get good results.

5 comments

mark_l_watson

throwaway2027 1 day ago

Seconded. Gemini used to be trash and I used Claude and Codex a lot but gemini-3-flash-preview punches above it's weight, it's decent and I rarely if ever run into any token limit either.

verdverm 1 day ago

Thirded, I've been using gemini-3-flash to great effect. Anytime I have something more complicated, I give it to pro & flash to see what happens. Coin flip if flash is nearly equivalent (too many moving vars to be analytical at this point)

PlatoIsADisease 1 day ago

What models are you running locally? Just curious.

I am mostly restricted to 7-9B. I still like ancient early llama because its pretty unrestricted without having to use an abliteration.

mark_l_watson 1 day ago

I experimented with many models on my 16G and 32G Macs. For less memory, qwen3:4b is good, for the 32B Mac, gpt-oss:20b is good. I like the smaller Mistral models like mistral:v0.3 and rnj-1:latest is a pretty good small reasoning model.

nurettin 1 day ago

I like to ask claude how to prompt smaller models for the given task. With one prompt it was able to make a low quantized model call multiple functions via json.