Comment by Karrot_Kream

2 months ago

My experience has been that generating structured output with zero, one, and few-shot prompts works quite well. We've used it at $WORK for zero-shot stuff and it's been good enough. I've done few-shot prompting for some personal projects and it's been solid. JSON Schema based enforcement of responses with temperature 0 settings works quite well. Sometimes LLMs hallucinate their responses but if you keep output formats fairly constrained (e.g. structured dicts of booleans) it decreases hallucinations and even when they do hallucinate, at temperature 0 it seems to stay within < 0.1% of responses even with zero-shot prompting. (At least with datasets and prompts I've considered.)

(Though yes, keep in mind that 0.1% hallucination = 99.9% correctness which is really not that high when we're talking about high reliability things. With zero-shot that far exceeded my expectations though.)

0 comments

Karrot_Kream

No comments yet

Contribute on Hacker News ↗