Comment by only-one1701

2 months ago

What should I have done there? Tell it to make sure that it gives me all 10 objects I give it back? Tell it to not put brackets in the wrong place? This is a real question --- what would you have done?

9 comments

only-one1701

tmpz22 2 months ago

In no particular order:

* experiment with multiple models, preferably free high quality models like Gemini 2.5. Make sure you're using the right model, usually NOT one of the "mini" varieties even if its marketed for coding.

* experiment with different ways of delivering necessary context. I use repomix to compile a codebase to a text file and upload that file. I've found more integrated tooling like cursor, aider, or copilot, are less effective then dumping a text file into the prompt

* use multi-step workflows like the one described [1] to allow the llm to ask you questions to better understand the task

* similarly use a back-and-forth one-question-at-a-time conversation to have the llm draft the prompt for you

* for this prompt I would focus less on specifying 10 results and more about uploading all necessary modules (like with repomix) and then verifying all 10 were completed. Sometimes the act of over specifying results can corrupt the answer.

[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

I'm a pretty vocal AI-hater, partly because I use it day to day and am more familiar with its shortfalls - and I hate the naive zealotry so many pro-AI people bring to AI discussions. BUTTT we can also be a bit more scientific in our assessments before discarding LLMs - or else we become just like those naive pro-AI-everything zealots.

bboozzoo 2 months ago

With that many ways to try things out differently hoping for good results, it feels like this would become a huge time sink, wouldn't it?

simonw 2 months ago

How long ago was this? I'd be surprised to see Claude 3.7 Sonnet make a mistake of this nature.

Either way, when a model starts making dumb mistakes like that these days I start a fresh conversation (to blow away all of the bad tokens in the current one), either with that model or another one.

I often switch from Claude 3.7 Sonnet to o3 or o4-mini these days. I paste in the most recent "good" version of the thing we're working on and prompt from there.

th0ma5 2 months ago
Lol, "it didn't do it... and if it did it didn't mean it... and if it meant it it surely can't mean it now." This is unserious.
- simonw 2 months ago
  
  A full two thirds of the comment you replied to there were me saying "when these things start to make dumb mistakes here are the steps I take to fix the problem".
  
  1 reply →
- gotimo 2 months ago
  
  this is the rhetoric that you will see replied to effectively any negative experience with LLMs in programming.
  
  1 reply →

pdimitar 2 months ago

You should have dropped the LLM, of course. They are not replacing us the programmers anytime soon. If they can be used as an enabler / booster, cool, if not, back to business as usual. You can only win here. You can't lose.