← Back to context

Comment by nsagent

10 months ago

If this does indeed beat all the closed source models, then I'm flabbergasted. The amount of time and resources Google, OpenAI, and Anthropic have put into improving the models to only be beaten in a couple weeks by two people (who as far as I know do not have PhDs and years of research experience) would be a pretty crazy feat.

That said, I'm withholding judgment on how likely the claims are. A friend who developed NoCha [1] is running the model on that benchmark, which will really stress test its ability to reason over full novels. I'll reserve judgement until then.

[1]: https://novelchallenge.github.io/

PhDs aren't relevant. It's more just a certificate that you can learn to learn and stay committed to hard and challenging things. It does give bonus points to VCs, because it's seems to be easier to market to other VCs, same applies for hedge funds.

And with fine tuning, there's zero math needed, it's a bit of common sense, and a lot's of data optimization.

  • I wouldn't say that PhD's aren't relevant. Remember a lot of this subsequent "bumps, steps and leaps" advancement has come _after_ the initial work by the OpenAI's etc. "Standing on the shoulders of giants" is a thing.

Anyone have or know of a list of LLM challenges like this? Targeted use cases with unpublished test data?

One question about the Novels challenge: as there are two true/false questions, a random pick of answer will give a 25% success rate right? How do some model manage to be below 25?