← Back to context

Comment by huytersd

2 years ago

You can include in the prompt a requirement to highlight sections the LLM was not sure about/needs to be verified.

Wouldn't that work just as well as including in the prompt a requirement for it to not make any mistakes?

  • With some LLMs, emphasizing the possibility and appropriateness of saying "I don't know" has reduced the frequency of hallucinations.

  • If you assume the LLM can gauge its “confidence” in the last n tokens it generated, which seems within the realm of reason (from a layman’s perspective), then I would think this idea would work better the significant majority of the time. It’s providing an additional dimension of context related to the output (which we’re assuming is sound, or at least not entirely nonsensical), which alone seems like enough of a justification to do this. It’s unclear (to me, at least) exactly what effect adding a “no mistake” requirement to the prompt would have on the LLM’s output; I could see it skipping ranges of tokens that it’s unsure about, which seems less preferable to having it provide a best guess and make clear that it’s only a guess, but I could also certainly see it operating as it otherwise would have without the “no mistake” instruction, giving the same dubious output to a user that may now have an unwarranted increase in confidence in the LLM’s output.

    I’ve spent a decent amount of free time doing what feels like coercing, tricking, or otherwise manipulating GPT-4 and Llama2 into doing my bidding - with my bidding being mostly toy ideas for little tools to make random small tasks easier and one or two more interesting ideas that are fun to mess around with, but would probably require some medical-grade antianxiety meds to even consider using in a real production setting (ie a universal ORM.) Even though I’m not developing (or I guess we now call it prompt engineering) in a rigorous or serious way, I’ve found that making the LLM _actively_ reconsider and validate its output works very well, with the effectiveness seeming to be a rough function of “how actively” you trick it into doing so. Giving a list of “be sure to consider these things” at the end of your prompt often works, but also very often doesn’t; adding another step to the process you’re asking them to perform comprised of subtasks that map to the list of gotchas, but reframed as actions you are requiring them to perform, is often the remedy for cases where the simple suggestion list isn’t enough, and is basically a more active variant of the same idea as providing the gotcha list. Dialing it up a bit more, requiring them to provide an update after they complete each subtask to confirm they indeed performed it and to provide a summary of what they found makes their retrospective assessment even more actively involved, and has been a pretty damn reliable trick for ironing out kinks and known failure modes in my prompts.

    All that being said, I think the simple fact that you’re now actively requiring them to reflect on their confidence in their output, and therefore the correctness of their output, may lead to this idea improving the quality of output/results as an unintended side effect that would alone make it worth doing.