Comment by kfajdsl
6 days ago
All the LLM logprob outputs I've seen aren't very well calibrated, at least for transcription tasks - I'm guessing it's similar for OCR type tasks.
6 days ago
All the LLM logprob outputs I've seen aren't very well calibrated, at least for transcription tasks - I'm guessing it's similar for OCR type tasks.
"I already decided in my private reasoning trace to resolve this ambiguity by emitting the string '27' instead of '22' right here, thus '27' has 100% probability"