← Back to context

Comment by awnist

11 days ago

There is a one shot rewrite function now, but you're right, even when asking the LLM avoid some of the patterns, it will stubbornly repeat them. It's a bit more reliable with smaller fragments of text.

I am saying, keep reflecting its attempts back on itself. Over and over again, dozens of times if needed. We’ve seen it - any aligned model wants only to achieve its goal. But it does need to see all of its past attempts and where and why each attempt got a failing grade. That’s just a standard conversation history.

It might spit back the same thing the first round. But after the first time it received the exact same feedback for saying the same thing, the model will realize it’s in a deterministic sandbox and try something different. You need to give it all of the conversation including its past attempts as context. If it tries the exact same wording that’s okay, it’s just one more invisible round of back-and-forth. The model is going to rediscover how to work with the harness every time, but that’s not your users’ problem because you’ve hidden that wrinkly bit behind the automation - they just see “model did 10 drafts and here’s the result - would you like to view the result or page through the drafts?”

What I am describing is exactly what a human would do, it is just automated and thus, getting to a good result becomes insanely faster.