Comment by cpursley

6 months ago

I can’t even convince Gemini CLI while planning things to not go off and make a bunch of random changes on its own, even after being very clear not to do so, intercepting to tell it to stop doing that, then it just continues on fucking everything up.

3 comments

cpursley

WhitneyLand 6 months ago

Agents muddy the waters.

Claude Code gets the most out of Anthropic’s models, that’s why people love it.

Conversely, Gemini CLI makes Gemini Pro 2.5 less capable than the model itself actual is.

It’s such a stark difference I’ve given up using Gemini CLI even with it being free, but still use it for situations amenable to a prompt interface on a regular basis. It’s a very strong model.

panarky 6 months ago

That's my experience too, when I give Gemini CLI a big, general task and just let it run.

But if I give it structure so it can write its own context, it is truly astonishing.

I'll describe my big, general task and tell it to first read the codebase and then write a detailed requirements document, and not to change any code.

Then I'll tell it to read the codebase and the detailed requirements document it just wrote, and then write a detailed technical spec with API endpoints, params, pseudocode for tricky logic, etc.

Then I'll tell it to read the codebase, and the requirements document it just wrote, and the tech spec it just wrote, and decomp the whole development effort into weekly, daily and hourly tasks to assign to developers and save that in a dev plan document.

Only then is it ready to write code.

And I tell it to read the code base, requirements, tech spec and dev plan, all of which it authored, and implement Phase 1 of the dev plan.

It's not all mechanical and deterministic, or I could just script the whole process. Just like with a team of junior devs, I still need to review each document it writes, tweak things I don't like, or give it a better prompt to reflect my priorities that I forgot to tell it the first time, and have it redo a document from scratch.

But it produces 90% or more of its own context. It ingests all that context that it mostly authored, and then just chugs along for a long time, rarely going off the rails anymore.