Comment by tmpz22

1 day ago

Prompt engineering is just trying that task on a variety of models and prompt variations until you can better understand the syntax needed to get the desired outcome, if the desired outcome can be gotten.

Honestly you’re trying to prove AI is ineffective by telling us it didn’t work with your ineffective protocol. That is not a strong argument.

What should I have done there? Tell it to make sure that it gives me all 10 objects I give it back? Tell it to not put brackets in the wrong place? This is a real question --- what would you have done?

  • You should have dropped the LLM, of course. They are not replacing us the programmers anytime soon. If they can be used as an enabler / booster, cool, if not, back to business as usual. You can only win here. You can't lose.

  • How long ago was this? I'd be surprised to see Claude 3.7 Sonnet make a mistake of this nature.

    Either way, when a model starts making dumb mistakes like that these days I start a fresh conversation (to blow away all of the bad tokens in the current one), either with that model or another one.

    I often switch from Claude 3.7 Sonnet to o3 or o4-mini these days. I paste in the most recent "good" version of the thing we're working on and prompt from there.

  • In no particular order:

    * experiment with multiple models, preferably free high quality models like Gemini 2.5. Make sure you're using the right model, usually NOT one of the "mini" varieties even if its marketed for coding.

    * experiment with different ways of delivering necessary context. I use repomix to compile a codebase to a text file and upload that file. I've found more integrated tooling like cursor, aider, or copilot, are less effective then dumping a text file into the prompt

    * use multi-step workflows like the one described [1] to allow the llm to ask you questions to better understand the task

    * similarly use a back-and-forth one-question-at-a-time conversation to have the llm draft the prompt for you

    * for this prompt I would focus less on specifying 10 results and more about uploading all necessary modules (like with repomix) and then verifying all 10 were completed. Sometimes the act of over specifying results can corrupt the answer.

    [1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

    I'm a pretty vocal AI-hater, partly because I use it day to day and am more familiar with its shortfalls - and I hate the naive zealotry so many pro-AI people bring to AI discussions. BUTTT we can also be a bit more scientific in our assessments before discarding LLMs - or else we become just like those naive pro-AI-everything zealots.