I think this is at the moment the practical limitation to using AI for everything (and what the coding agents themselves also optimize for to some degree, or it's the slider they can play with for price vs quality, the "thinking" models being the exact same, but just burning more tokens).
Am waiting for the next Mac Studio to come out to experiment with the "AI for everything" approach. Most likely, the open source distilled models will lower quality. So, another "price vs quality" tradeoff. Still, will be fun to code like I'm at a foundation lab.
This seems like a perfect use case for a local model. But I've found in practice that the system requirements for agents are much higher than for models that can handle simple refactoring tasks. Once tool use context is factored in, there is very little room for models that perform decently.
Whatever agent I tried would include thousands of tokens in tool-use instruction. That would use up most available context unless running very low-spec models. I've concluded it's best to use the big 3 for most tasks and qwen on runpod for more private data.
I think this is at the moment the practical limitation to using AI for everything (and what the coding agents themselves also optimize for to some degree, or it's the slider they can play with for price vs quality, the "thinking" models being the exact same, but just burning more tokens).
Am waiting for the next Mac Studio to come out to experiment with the "AI for everything" approach. Most likely, the open source distilled models will lower quality. So, another "price vs quality" tradeoff. Still, will be fun to code like I'm at a foundation lab.
This seems like a perfect use case for a local model. But I've found in practice that the system requirements for agents are much higher than for models that can handle simple refactoring tasks. Once tool use context is factored in, there is very little room for models that perform decently.
What I hope to do with refactoring is to distill namespace and common patterns into a DSL. I am very curious about what tradeoffs you found.
Whatever agent I tried would include thousands of tokens in tool-use instruction. That would use up most available context unless running very low-spec models. I've concluded it's best to use the big 3 for most tasks and qwen on runpod for more private data.