Comment by jmalicki

7 hours ago

> How do they know [person] is an expert in [some field]? How do they find that person?

They have a PhD from a top school, they are a licensed attorney, they are a licensed physician, a board certified cardiologist, etc.

They are constantly recruiting from these populations with well-paying side gigs.

> 4) And judge the result

That's what they pay the experts for. And to have experts review the other experts with peer review.

> You can find a lot of people who disagree on many topics, and those turtles go all the way down.

Which is why everything has to be well-calibrated and not just a hot take - a well reasoned opinion any expert would find fair.

Noone is really caring about hallucinations on point facts these days though, it is much more about complex reasoning tasks. Can they move the bar on the complexity of software LLMs do on their own? Can they get to a point where LLMs can begin to replace physicians? Financial advisors? Actuaries? etc.

13 comments

jmalicki

macleginn 6 hours ago

> Noone is really caring about hallucinations on point facts these days though, it is much more about complex reasoning tasks.

The boundary is pretty thin there though. E.g., Gemini recently told me that a certain papers claims that two frameworks are mathematically equivalent, while the paper shows the opposite, and yesterday Google's AI overview told me that no World Cup matches were scheduled for that day despite their being several of them. The model probably used complex reasoning to arrive at both (incorrect) answers, but superficially they look like basic errors of fact.

jmalicki 6 hours ago
That is a great example of the kind of thing they're paying people to create as training data.
You write the prompt, and then write rubrics to judge the responses, and you found something the model failed at. Congratulations, you just earned $500, now do it again.
- macleginn 3 hours ago
  
  Not the worst way to make money, but if internet-scale data were not enough to reduce errors to a somewhat tolerable margin, how much data do they hope to collect in this manner?
  
  4 replies →

maxnevermind 4 hours ago

That is informative, I was suspecting that is how models improve their performance on some convoluted "non-googlabe" benchmarks like SimpleBench, that is how, they just got the taste of those those questions from publicly available samples and then hired people to generate similar questions and provide answers for them.

I wonder if extracting those static reasoning chains make sense given a Rich Sutton's "The Bitter Lesson" and Geoffrey Hinton's "People should stop training radiologists now.". I guess until participants make money they won't stop, not sure if they do, so far it is more about expectation of profitability as I understand.

jmalicki 3 hours ago
There is one level that these training data give examples of specific static reasoning chains.
Given exposure to enough reasoning chains, with training data that is designed around adversarial reasoning and teaching models to reason, these types of training data might be key to teaching models to reason beyond what they could gather from static data.
- maxnevermind 3 hours ago
  
  > these types of training data might be key to teaching models to reason beyond what they could gather from static data.
  I was under impression that every time LLMs try to be truly novel and they need to assume things in the area where they didn't have enough data points that there were trained on, results are not good, has that changed?
  
  1 reply →

giardini 5 hours ago

Ahhhh! the ever-present omniscient "they" of paranoia!

But be careful: they are watching you and they don't want you giving away their secrets!