Comment by suttontom

4 hours ago

Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".

IME people would benefit greatly from the process, albeit tedious and time-consuming, of testing out the same prompt sequence/session with the exact same model multiple times. It becomes clear extremely quickly how capable but unreliable and inconsistent a model can be even when given the same context. If you have ever completed a long, complicated task with an agent and then lost the session and tried doing the same thing again from scratch you may have had the experience of seeing the subtle changes that come up in the model's thinking which lead it to accept or reject certain paths and ignore or incorporate prompt instructions like the one you've provided.

0 comments

suttontom

No comments yet

Contribute on Hacker News ↗