Comment by energy123
1 year ago
o1-mini does better than any other model on zebra puzzles. Maybe you got unlucky on one question?
https://www.reddit.com/r/LocalLLaMA/comments/1ffjb4q/prelimi...
1 year ago
o1-mini does better than any other model on zebra puzzles. Maybe you got unlucky on one question?
https://www.reddit.com/r/LocalLLaMA/comments/1ffjb4q/prelimi...
Entirely possible. I did not try to test systematically or quantitatively, but it's been a recurring easy "demo" case I've used with releases since 3.5-turbo.
The super verbose chain-of-reasoning that o1 does seems very well suited to logic puzzles as well, so I expected it to do reasonably well. As with many other LLM topics, though, the framing of the evaluation (or the templating of the prompt) can impact the results enormously.