Comment by stouset

6 days ago

I saw this article and thought, now's the time to try again!

Using Claude Sonnet 4, I attempted to add some better configuration to my golang project. An hour later, I was unable to get it to produce a usable configuration, apparently due to a recent v1-to-v2 config format migration. It took less time to hand-edit one based on reading the docs.

I keep getting told that this time agents are ready. Every time I decide to use them they fall flat on their face. Guess I'll try again in six months.

7 comments

stouset

simonw 6 days ago

If you share your conversation (with the share link in Claude) I'd be happy to see if there are any tweaks I can suggest to how you prompted it.

porridgeraisin 6 days ago

Yes.

I made the mistake of procrastinating on one part of a project thinking "Oh, that is easily LLMable". By God, was I proven wrong. Was quite the rush before the deadline.

On the flip side, I'm happy I don't have to write the code for a matplotlib scatterplot for the 10000th time, it mostly gets the variables in the current scope that I intended to plot. But I've really not had that much success on larger tasks.

The "information retrieval" part of the tech is beautiful though. Hallucinations are avoided only if you provide an information bank in the context in my experience. If it needs to use the search tool itself, it's not as good.

Personally, I haven't seen any improvement from the "RLd on math problems" models onward (I don't care for benchmarks). However, I agree that deepseek-r1-zero was a cool result. Pure RL (plain R1 used a few examples) automatically leading to longer responses.

A lot of the improvements suggested in this thread are related to the infra around LLMs such as tool use. These are much more well organised these days with MCP and what not, enabling you to provide it the aforementioned information bank easily. But all of it is built on top of the same fragile next-token generator we know and love.

DarmokJalad1701 6 days ago

> It took less time to hand-edit one based on reading the docs.

You can give it the docs as an "artifact" in a project - this feature has been available for almost one year now.

Or better yet, use the desktop version + a filesystem MCP server pointing to a folder containing your docs. Tell it to look at the docs and refactor as necessary. It is extremely effective at this. It might also work if you just give it a link to the docs.

stouset 5 days ago

The agent reached out to the internet and pulled the golangci-lint docs. Repeatedly. After generating a v1-compatible config I pointed it to the v2 docs. It tried to do the automatic migration but still wound up with incorrect syntax. I asked it to throw away what it had and build a fresh v2-compatible config. It again consulted the docs. Repeat ad nauseam.
I threw in the towel and had a working config in ten minutes.

Yiin 6 days ago

you can add links to docs to llm agents instead of letting them work blindfolded with hardcoded assumptions

maleldil 6 days ago

Claude Code will even request access documentation on its own sometimes. I caught it asking to run a `pydoc` command the other day. I'm not sure if it has access to web search, but it should.
stouset 5 days ago

It reached out to the Internet and pulled the docs. Repeatedly. I even linked to the docs directly to be helpful.