Comment by buryat

18 hours ago

I have a Mac Studio with 512GB Ram and ran models of different sizes to test out how local agents are and I agree that local models aren't there yet but that depends on whether you need a lot of knowledge or not to answer your question, and I think it should be possible to either distill or train a smaller model that works on a subset of knowledge tailored toward local execution. My main interest is in reducing the latency and it feels that the local agents that work at high speeds should be an answer to this but it's not something that someone is trying to solve yet. Feels like if I could get a smaller model that could run at incredible speed locally that could unlock some interesting autoresearching.

Also running gemma-4 on Apple M5 Max. As fast or faster than Opus 4.6 extended but not of course the same competence. However, great tunability with llama.cpp and no issues related to IP leakage.

I've been running Gemma4, my initial experiments put it around gemini-3-flash levels (vibe evals)