Comment by tmpz22

1 year ago

Prompt engineering is just trying that task on a variety of models and prompt variations until you can better understand the syntax needed to get the desired outcome, if the desired outcome can be gotten.

Honestly you’re trying to prove AI is ineffective by telling us it didn’t work with your ineffective protocol. That is not a strong argument.

10 comments

tmpz22

only-one1701 1 year ago

What should I have done there? Tell it to make sure that it gives me all 10 objects I give it back? Tell it to not put brackets in the wrong place? This is a real question --- what would you have done?

tmpz22 1 year ago
In no particular order:
* experiment with multiple models, preferably free high quality models like Gemini 2.5. Make sure you're using the right model, usually NOT one of the "mini" varieties even if its marketed for coding.
* experiment with different ways of delivering necessary context. I use repomix to compile a codebase to a text file and upload that file. I've found more integrated tooling like cursor, aider, or copilot, are less effective then dumping a text file into the prompt
* use multi-step workflows like the one described [1] to allow the llm to ask you questions to better understand the task
* similarly use a back-and-forth one-question-at-a-time conversation to have the llm draft the prompt for you
* for this prompt I would focus less on specifying 10 results and more about uploading all necessary modules (like with repomix) and then verifying all 10 were completed. Sometimes the act of over specifying results can corrupt the answer.
[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
I'm a pretty vocal AI-hater, partly because I use it day to day and am more familiar with its shortfalls - and I hate the naive zealotry so many pro-AI people bring to AI discussions. BUTTT we can also be a bit more scientific in our assessments before discarding LLMs - or else we become just like those naive pro-AI-everything zealots.
- bboozzoo 1 year ago
  
  With that many ways to try things out differently hoping for good results, it feels like this would become a huge time sink, wouldn't it?
simonw 1 year ago
How long ago was this? I'd be surprised to see Claude 3.7 Sonnet make a mistake of this nature.
Either way, when a model starts making dumb mistakes like that these days I start a fresh conversation (to blow away all of the bad tokens in the current one), either with that model or another one.
I often switch from Claude 3.7 Sonnet to o3 or o4-mini these days. I paste in the most recent "good" version of the thing we're working on and prompt from there.
- th0ma5 1 year ago
  
  Lol, "it didn't do it... and if it did it didn't mean it... and if it meant it it surely can't mean it now." This is unserious.
  
  4 replies →
pdimitar 1 year ago

You should have dropped the LLM, of course. They are not replacing us the programmers anytime soon. If they can be used as an enabler / booster, cool, if not, back to business as usual. You can only win here. You can't lose.