Comment by energy123

6 months ago

We need a hallucination benchmark.

My experience is, o1 is very good at avoiding hallucinations and I trust it more, but o1-mini and 4o are awful.

4 comments

energy123

Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.

energy123 6 months ago
It is expensive. But if I'm correct about o1, it means user mistrust of LLMs is going to be a short-lived thing as costs come down and more people use o1 (or better) models as their daily driver.
- sdesol 6 months ago
  
  > mistrust of LLMs is going to be a short-lived thing as costs come down and more people use o1
  I think the biggest question is, is o1 scalable. I think o1 does well because it is going back and forth hundreds if not thousands of times. Somebody mentioned in a thread that I was participating in that they let o1 crunch things for 10 minutes. It sounded like it saved them a lot work, so it was well worth it.
  Whether or not o1 is practical for the general public is something we will have to wait and see.
  
  1 reply →