Comment by aithrowawaycomm
6 months ago
What's even more suspicious is that these tweets from Elliot Glazer indicate that they are still "developing" the hold-out set, even though elsewhere Epoch AI strongly implied this already existed: https://xcancel.com/ElliotGlazer/status/1880809468616950187
It seems to me that o3's 25% benchmark score is 100% data contamination.
> I just saw Sam Altman speak at YCNYC and I was impressed. I have never actually met him or heard him speak before Monday, but one of his stories really stuck out and went something like this:
> "We were trying to get a big client for weeks, and they said no and went with a competitor. The competitor already had a terms sheet from the company were we trying to sign up. It was real serious.
> We were devastated, but we decided to fly down and sit in their lobby until they would meet with us. So they finally let us talk to them after most of the day.
> We then had a few more meetings, and the company wanted to come visit our offices so they could make sure we were a 'real' company. At that time, we were only 5 guys. So we hired a bunch of our college friends to 'work' for us for the day so we could look larger than we actually were. It worked, and we got the contract."
> I think the reason why PG respects Sam so much is he is charismatic, resourceful, and just overall seems like a genuine person.
https://news.ycombinator.com/item?id=3048944
Man, the real ugliness is the comments hooting and hollering for this amoral cynicism:
Gross.
Nothing says genuine like lying to get a contract.
This sort of "adjusting the truth" is widespread in business. It's not OK, but people should not be shocked by this.
Also, if marks want to be so gullible, it's on them. It's your money and YOUR due diligence.
This was my assumption all along.
> What's even more suspicious is that these tweets from Elliot Glazer indicate that they are still "developing" the hold-out set,
There is nothing suspicious about this and the wording seems to be incorrect.
A hold-out set is a percentage of the overall data that is used to test a model. It is just not trained on it. Model developers normally have full access to it.
There is nothing inherently wrong with training on a full/partial hold out set. It just means you have done a different split to train again.
The confusion I see here is that people are equating a hold out set to a blind set. That's a set of data to test against that the model developers (and model) cannot see.
Even so blind sets can also go stale after a few runs and nothing is wrong with ingesting that blind set, as long as you have a new blind set to run against.
Trying to game blind set tests is nothing new and it gets very quickly found out.
What I took from the original article is that the blind set is likely unbalanced and it answered more easier questions than hard ones.
> The confusion I see here is that people are equating a hold out set to a blind set. That's a set of data to test against that the model developers (and model) cannot see.
What on earth? This is from Tamay Besiroglu at Epoch:
So this "confusion" is because Epoch AI specifically told people it was a blind set! Despite the condescending tone, your comment is just plain wrong.
Your quote literally says hold-out set.
2 replies →