← Back to context

Comment by CamperBob2

13 hours ago

The technique of pitting one model against another is usually pretty effective in my experience. If Gemini 2.0 Advanced and o1-pro agree on something, you can usually take it to the bank. If they don't, that's when human intervention is necessary, given the lack of additional first-rank models to query. (Edit: 1682 versus 1692 being a great example of something that a tiebreaker model could handle.)

It seems likely that a mixture-of-models approach like this will be a good thing to formalize at some level. Using appropriately-trained models to begin with seems even more important, though, and I can't agree that this type of content is relevant when discussing straightforward OCR tasks on modern languages.

> I can't agree that this type of content is relevant when discussing straightforward OCR tasks on modern languages.

1682 is a number though, language independent, and you noted it as being extremely obvious to a human, even one who can't read any of the other language. So I do think the tools are useful, but people probably still need to be there for now until better models for this are made that stop getting especially obvious parts wrong.