Comment by meander_water
7 hours ago
I'm so excited for this, nice work!
Gemma4 edge models were promised to be great for agentic use, but have been really disappointing in all my tests. They fail at the most basic tool use scenarios.
Have you run any tool-use benchmarks for Needle, or do you plan to? Would be great if you could add results to the repo if so.
No comments yet
Contribute on Hacker News ↗