← Back to context

Comment by zackify

10 hours ago

This definitely is the case. I was talking to someone complaining about how llms don't work good.

They said it couldn't fix an issue it made.

I asked if they gave it any way to validate what it did.

They did not, some people really are saying "fix this" instead of saying "x fn is doing y when someone makes a request to it. Please attempt to fix x and validate it by accessing the endpoint after and writing tests"

Its shocking some people don't give it any real instruction or way to check itself.

In addition I get great results doing voice to text with very specific workflows. Asking it to add a new feature where I describe what functions I want changed then review as I go vs wait for the end.

> Its shocking some people don't give it any real instruction or way to check itself.

It's not shocking. The tech world is telling them that "Claude will write all of their app easily" with zero instructions/guidelines so of course they're going to send prompts like that.

  • I think the implications of limited to no instructions are a little to way off depending on what you're doing... CRUD APIs, sure... especially if you have a well defined DB schema and API surface/approach. Anything that might get complex, less so.

    Two areas I've really appreciated LLMs so far... one is being able to make web components that do one thing well in encapsulation.. I can bring it into my project and just use it... AI can scaffold a test/demo app that exercises the component with ease and testing becomes pretty straight forward.

    The other for me has been in bridging rust to wasm and even FFI interfaces so I can use underlying systems from Deno/Bun/Node with relative ease... it's been pretty nice all around to say the least.

    That said, this all takes work... lots of design work up front for how things should function... weather it's a ui component or an API backend library. From there, you have to add in testing, and some iteration to discover and ensure there aren't behavioral bugs in place. Actually reviewing code and especially the written test logic. LLMs tend to over-test in ways that are excessive or redundant a lot of the time. Especially when a longer test function effectively also tests underlying functionalities that each had their own tests... cut them out.

    There's nothing "free" and it's not all that "easy" either, assuming you actually care about the final product. It's definitely work, but it's more about the outcome and creation than the grunt work. As a developer, you'll be expected to think a lot more, plan and oversee what's getting done as opposed to being able to just bang out your own simple boilerplate for weeks at a time.

  • It's surprising they don't learn better after their first hour or two of use. Or maybe they do know better but don't like the thing so they deliberately give it rope to hang itself with, then blame overzealous marketting.

There are subtler versions of this too. I've been working on a TUI app for a couple of weeks, and having great success getting it to interactively test by sending tmux commands, but every once in a while it would just deliver code that didn't work. I finally realized it was because the capture tools I gave it didn't capture the cursor location, so it would, understandably, get confused about where it was and what was selected.

I promptly went and fixed this before doing any more work, because I know if I was put in that situation I would refuse to do any more work until I could actually use the app properly. In general, if you wouldn't be able to solve a problem with the tools you give an LLM, it will probably do a bad job too.

If you tell a human junior developer just "fix this" then they will spend a week on a wild-goose chase with nothing to show for it.

At least the LLM will only take 5 minutes to tell you they don't know what to do.

  • Do they? I’ve never got a response that something was impossible, or stupid. LLMs are happy to verify that a noop does nothing, if they don’t know how to fix something. They rather make something useless than really tackle a problem, if they can make tests green that way, or they can claim that something “works”.

    And’ve I never asked Claude Code something which is really impossible, or even really difficult.

    • Claude code will happily tell me my ideas are stupid, but I think that's because I nest my ideas in between other alternative ideas and ask for an evaluation of all of them. This effectively combats the sycophantic tendencies.

      Still, sometimes claude will tell me off even when I don't give it alternatives. Last night I told it to use luasocket from an mpv userscript to connect to a zeromq Unix socket (and also implement zmq in pure lua) connected to an ffmpeg zmq filter to change filter parameters on the fly. Claude code all but called me stupid and told me to just reload the filter graph through normal mpv means when I make a change. Which was a good call, but I told it to do the thing anyway and it ended up working well, so what does it really know... Anyway, I like that it pushes back, but agrees to commit when I insist.

      1 reply →

  • To be fair, that happening feels more like poor management and mentorship than "juniors are scatterbrained".

    Over time, you build up the right reflexes that avoid a one-week goose chase with them. Heck, since we're working with people, you don't just say " fix this", you earmark time to make sure everyone is aligned on what needs done and what the plan is.

  • > At least the LLM will only take 5 minutes to tell you they don't know what to do.

    In my experience, the LLM will happily try the wrong thing over and over for hours. It rarely will say it doesn’t know.

    • Don’t ask it to make changes off the bat, then - ask it to make a plan. Then inspect the plan, change it if necessary, and go from there.

Yeah, the more time I spend in planning and working through design/api documentation for how I want something to work, the better it does... Similar for testing against your specifications, not the code... once you have a defined API surface and functional/unit tests for what you're trying to do, it's all the harder for AI to actually mess things up. Even more interesting is IMO how well the agents work with Rust vs other languages the more well defined your specifications are.

> some people really are saying "fix this" instead of saying "x fn is doing y when someone makes a request to it. Please attempt to fix x and validate it by accessing the endpoint after and writing tests"

This works about 85% of the time IME, in Claude Code. My normal workflow on most bugs is to just say “fix this” and paste the logs. The key is that I do it in plan mode, then thoroughly inspect and refine the plan before allowing it to proceed.

Untested Hypothesis: LLM instruction is usually an intelligence+communication-based skill. I find in my non-authoritative experience that users who give short form instructions are generally ill prepared for technical motivation (whether they're motivating LLMs or humans).

lol that is still “how you’re talking to them that affects the results” just more specific

Feeding the LLM a "copy as cURL" for its feedback loop instead of letting it manage the dev server was an unlock for me.