← Back to context

Comment by energy123

1 day ago

We need a hallucination benchmark.

My experience is, o1 is very good at avoiding hallucinations and I trust it more, but o1-mini and 4o are awful.

Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.

  • It is expensive. But if I'm correct about o1, it means user mistrust of LLMs is going to be a short-lived thing as costs come down and more people use o1 (or better) models as their daily driver.

    • > mistrust of LLMs is going to be a short-lived thing as costs come down and more people use o1

      I think the biggest question is, is o1 scalable. I think o1 does well because it is going back and forth hundreds if not thousands of times. Somebody mentioned in a thread that I was participating in that they let o1 crunch things for 10 minutes. It sounded like it saved them a lot work, so it was well worth it.

      Whether or not o1 is practical for the general public is something we will have to wait and see.

      1 reply →