Comment by namaria

4 days ago

I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.

But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.

The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?

It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.

1 comment

namaria

hammyhavoc 14 hours ago

What commands/progs on your local Linux box? Would love to be able to quantify how inaccurate the LLMs are compared to what people already use for their boilerplate stuff.

I've found the agents incredibly hit and miss. Mostly miss. The likes of Claude Code occasionally does something surprising and it actually works (usually there's a public example it's copied wholly when you research the code it gave you, especially for niche stuff), but then the rest of the time you spend hours wrestling it into submission over something you could do in minutes, all whilst it haemorrhages context sporadically. Even tried adding an additional vector database to the likes of Claude Code to try and get around this, but it's honestly a waste of time in my experiences.

Is it "useless"? For me, yes, probably. I can't find any valid use for an LLM so far in terms of creating new things. What's already been done before? Sure. But why an LLM in that case?

The strangest thing I've seen so far is Claude Code wanting a plugin to copy values from a metadata column in WordPress to then read, which is triggered by a watcher every five minutes—instead of just reading the value when relevant. It could not be wrangled into behaving over this and I gave up.

Took me 2 minutes to do the whole thing by hand, and it worked first try (of course—it's PHP—not complicated compared to Verilog and DSP, at which it is spectacularly bad in its output).

It does very odd things in terms of secrets and Cloudflare Workers too.

The solutions it gives are frequently nonsensical, incomplete, mixes syntax from various languages (which sometimes it catches itself on before giving you the artifact), and almost always wholly in how inefficient the pointless steps to a simple task are.

Giving Claude Code tutorials, docs, and repos of code is usually a shitshow too. I asked their customer support for a refund weeks ago and have heard nothing. All hype and no substance.

I can see how someone without much dev experience might be impressed by its output, especially if they're only asking it to do incredibly simplistic stuff, for which there's plenty of examples and public discourse on troubleshooting bad code, but once you get into wanting to do new things, I just don't see how anyone could think this is ever going to be viable.

I mucked around with autonomous infrastructure via Claude Code too, and just found that it did absolutely bizarre things that made no sense in terms of managing containers relative to logs, suggesting configurations et al. Better off with dumb scripts with your env vars, secrets et al.