Comment by godelski

7 months ago

  > The thing is, AI researchers have continually underestimated the pace of AI progress

What's your argument?

That because experts aren't good at making predictions that non-experts must be BETTER at making predictions?

Let me ask you this: who do you think is going to make a less accurate prediction?

Assuming no one is accurate here, everybody is wrong. So the question is who is more or less accurate. Because there is a thing as "more accurate" right?

  >> In 2022, they thought AI wouldn’t be able to write simple Python code until around 2027.

Go look at the referenced paper[0]. It is on page 3, last item in Figure 1, labeled "Simple Python code given spec and examples". That line is just after 2023 and goes to just after 2028. There's a dot representing the median opinion that's left of the vertical line half way between 2023 and 2028. Last I checked, 8-3 = 5, and 2025 < 2027.

And just look at the line that follows

  > In 2023, they reduced that to 2025, but AI could maybe already meet that condition in 2023

Something doesn't add up here... My guess, as someone who literally took that survey, is what's being referred to as "a simple program" has a different threshold.

Here's the actual question from the survey

  Write concise, efficient, human-readable Python code to implement simple algorithms like quicksort. That is, the system should write code that sorts a list, rather than just being able to sort lists.
  
  Suppose the system is given only:
    A specification of what counts as a sorted list
    Several examples of lists undergoing sorting by quicksort

Is the answer to this question clear? Place your bets now!

Here, I asked ChatGPT the question[1], it got it wrong. Yeah, I know it isn't very wrong, but it is still wrong. Here's an example of a correct solution[2] which shows the (at least) two missing lines. Can we get there with another iteration? Sure! But that's not what the question was asking.

I'm sure some people will say that GPT gave the right solution. So what that it ignored the case of a singular array and assumed all inputs are arrays. I didn't give it an example of a singular array or non-array inputs, but it did just assume. I mean leetcode questions pull out way more edge cases than I'm griping on here.

So maybe you're just cherry-picking. Maybe the author is just cherry-picking. Because their assertion that "AI could maybe already meet that condition in 2023" is not unobjectively true. It's not clear that this is true in 2025!

[0] https://arxiv.org/abs/2401.02843

[1] https://chatgpt.com/share/688ea18e-d51c-8013-afb5-fbc85db0da...

[2] https://www.geeksforgeeks.org/python/python-program-for-inse...

2 comments

godelski

atleastoptimal 7 months ago

>Go look at the referenced paper[0]. It is on page 3, last item in Figure 1, labeled "Simple Python code given spec and examples". That line is just after 2023 and goes to just after 2028. There's a dot representing the median opinion that's left of the vertical line half way between 2023 and 2028. Last I checked, 8-3 = 5, and 2025 < 2027.

The graph you're looking at is of the 2023 survey, not the 2022 one

As for your question, I don't see what it proves. You described the desired conditions for an a sorting algorithm and chatGPT implemented a sorting algorithm. In the case of an array with one element, it bypasses the for loop automatically and just returns the array. It is reasonable for it to assume all inputs are arrays because your question told it that its requirements were to create a program that " turn any list of numbers into a foobar."

Of course I'm not any one of the researchers asked about their predictions in the survey, but I'm sure if you told them "a SOTA AI in 2025 produced working human readable code based on a list of specifications, and is only incorrect by a broad characterization of what counts as an edge case that would trip up a reasonable human coder on the first try", I'm sure the 2022 or 2023 respondents would say that it meets their criteria for their threshold.

godelski 7 months ago
> As for your question, I don't see what it proves.
The author made a claim
I showed the claim was false
The author bases his argument on this and similar claims. Showing his claim is false says he's argument doesn't hold
> and is only incorrect by a broad characterization
I don't know of I'd really call a single item an "edge case" so much as generalization.
But I do know I'd answer that question differently given your reframing.