Comment by mattnewton

4 days ago

Models will always struggle with this specific task without tool use, because of the way they tokenize things. I think a bit of prompt engineering, asking it to spell out each work or giving it the ability to run a “contains e” python function on a lot of animal names it generates or searches for solves this.

Lots of local ai use cases I think are solvable similarly once local models get good at tool use and have the proper harness.

2 comments

mattnewton

bandrami 4 days ago

The problem with tool use is that I usually find I only need it for one component of a pipeline. So in this case mentally I would be tooling it as

cat /usr/share/dict/words | print_if_mammal | grep -v 'e'

but I don't know of a good way to incorporate an LLM into a pipeline like that (I know there's a Python API). What I'm actually interested in is "is this the name of a mammal?" but I don't know of the equivalent of a quiet "batch mode" at least for ollama (and of course performance).

I guess ultimately I would want to say "write a shell utility that accepts a line from standard input and prints it to standard output if that is the name of a mammal", and then use that utility in that pipeline. Or really to have an llmfilter utility that lets you do something like

cat /usr/share/dict/words | llmfilter "is this a mammal?" | grep -v "e"

and now that I've said that I think I'll try to make one.

mattnewton 3 days ago

This exists with Claude code / cursor agent, just agent -p or claude -p.
But I think the more powerful thing is “I want a storybook of mamals, one for each letter” -> local LLM that plans to use search for a list of animals, filters them by starting letter and picks one for each, and maybe calls a diffusion model for pictures or fetches Wikipedia to be get context to write a blurb about it.
The key unlock imo is the local LLM recognizing the limits of it’s own ability and completing tool use calls, rather than trying to one shot it with next word completion with its limited parameter count.