← Back to context

Comment by hammyhavoc

18 hours ago

What commands/progs on your local Linux box? Would love to be able to quantify how inaccurate the LLMs are compared to what people already use for their boilerplate stuff.

I've found the agents incredibly hit and miss. Mostly miss. The likes of Claude Code occasionally does something surprising and it actually works (usually there's a public example it's copied wholly when you research the code it gave you, especially for niche stuff), but then the rest of the time you spend hours wrestling it into submission over something you could do in minutes, all whilst it haemorrhages context sporadically. Even tried adding an additional vector database to the likes of Claude Code to try and get around this, but it's honestly a waste of time in my experiences.

Is it "useless"? For me, yes, probably. I can't find any valid use for an LLM so far in terms of creating new things. What's already been done before? Sure. But why an LLM in that case?

The strangest thing I've seen so far is Claude Code wanting a plugin to copy values from a metadata column in WordPress to then read, which is triggered by a watcher every five minutes—instead of just reading the value when relevant. It could not be wrangled into behaving over this and I gave up.

Took me 2 minutes to do the whole thing by hand, and it worked first try (of course—it's PHP—not complicated compared to Verilog and DSP, at which it is spectacularly bad in its output).

It does very odd things in terms of secrets and Cloudflare Workers too.

The solutions it gives are frequently nonsensical, incomplete, mixes syntax from various languages (which sometimes it catches itself on before giving you the artifact), and almost always wholly in how inefficient the pointless steps to a simple task are.

Giving Claude Code tutorials, docs, and repos of code is usually a shitshow too. I asked their customer support for a refund weeks ago and have heard nothing. All hype and no substance.

I can see how someone without much dev experience might be impressed by its output, especially if they're only asking it to do incredibly simplistic stuff, for which there's plenty of examples and public discourse on troubleshooting bad code, but once you get into wanting to do new things, I just don't see how anyone could think this is ever going to be viable.

I mucked around with autonomous infrastructure via Claude Code too, and just found that it did absolutely bizarre things that made no sense in terms of managing containers relative to logs, suggesting configurations et al. Better off with dumb scripts with your env vars, secrets et al.