Comment by pzo

5 months ago

> The models weren't allowed to access the internet

How many software developers could solve most even simple programming problems (except 'Hello world') with zero shot style (you write in notepad then can compile only once and execute once) without access to internet (stackoverflow, google search, documentation), tools (terminal, debugger, linter, cli)?

I think then it's not the best comparison to make any judgement. Future benchmark should test agents where they allowed to solve the problem in 5-10 minutes, allow give access to internet, documentation, linter, terminal with MCP servers.

23 comments

pzo

thrw011 5 months ago

> How many software developers could solve most even simple programming problems (except 'Hello world') with zero shot style (you write in notepad then can compile only once and execute once) without access to internet (stackoverflow, google search, documentation), tools (terminal, debugger, linter, cli)?

Many, there was a time when SO did not exist and people were able to solve non trivial problems. There was a time coding problems on exams had to be solved on paper and if they were not compiling you would not pass.

pzo 5 months ago
you miss my point about zero short style where you have only one shot to compile and execute you code. Even in old times when people programmed using punched cards it required a lot of reviews and iterations. This is the reason why scripting languages like python, ruby, php, javascript got popular because you had very fast feedback loop and do dozens of mini experiments. Majority of coding problems we have today are not algorithmic in nature.
- thrw011 5 months ago
  
  I had one shot at my exams, was writing them on paper, compiling code in my brain.

ipython 5 months ago

What would searching the Internet provide the models that they don’t already have? Most likely data sources such as stack overflow, documentation on the language it’s targeting, and a variety of relevant forum posts are already part of its training set.

Unless someone else came along and said “here’s how to solve x problem step by step”, I don’t see how additional information past its cutoff point would help. (Perhaps the AI could post on a forum and wait for an answer?)

Yes, iterative programming could help via access to tools- I can see that helping.

brookst 5 months ago
Why do programmers search for specific questions rather than always relying on their inherent knowledge?
I’m a crappy hobbyist programmer but for me it is useful to see if someone has implemented exactly what I need, or debugged the problem I’m having. I don’t think it’s reasonable to expect programmers or LLMs to know everything about every library’s use in every context just from first principles.
- ipaddr 5 months ago
  
  I do it to save the limited brain power I have before rest or food is required. You could spend 5 minutes writing a sort (at a high level processing) or just use existing code which might take 5 minutes to find but uses less brain power.
  This allows you to use that brain power on specific things that need you and let google remember the format of that specific command or let an ai write out your routing file.
  The older I get the less I'm bound by time, lack of knowledge or scope but more limited by clarity. Delegate tasks where possible and keep the clarity for the overall project and your position.
- ipython 5 months ago
  
  But why would that information not be included in the wide crawl already encoded in the model weights before the knowledge cutoff? I believe the article mentions frontier models so we are talking about models trained on trillions of tokens here
  
  2 replies →

ipaddr 5 months ago

You sound like someone who never used punch cards.

I think most developers could do that if they trained. As someone who learned how to program before the internet, its just a different mindset and would take some time to adjust.

I am doing that now where changes take a day to make it to staging and no local environment. You roll with it.

CPLX 5 months ago

> You sound like someone who never used punch cards.
I hope HN never changes.

rurp 5 months ago

It depends a lot on the type of problem. If we're talking about fixing a bug or adding a new feature to a large existing code base, which probably describes a huge portion or professional software engineering work, I would say most engineers could do most of those tasks without the internet. Especially if the goal is to simply pass a benchmark test of getting it working without future considerations.

milesrout 5 months ago

I barely ever look at StackOverflow as the quality of answers there is so poor. It was once good but the proliferation of duplicates[1] has really ruined it for me, as well as outdated answers not being replaced. Google search results are also crap.

I agree with your point, though. The "LLM" model just isn't a good fit for some tasks, in fact many tasks. It is good for creative writing, but even then only really because our standards for creative writing are pretty low. It doesn't write with any real creativity or flair in the writing. It can make things up and stay on topic. It is poor for anything where accuracy matters. It can't edit what it produces! Nobody writes things in one shot in reality, not even creative writing, but especially not code or technical writing. It needs to be able to do a whole suite of other things: move blocks of output around, rewrite chunks, expand chunks, condense chunks, check chunks against external sources or proper knowledge banks, compare chunks for internal consistency, and more. That is how we operate: at the level of functions or blocks of code, at the level of paragraphs and sentences and sections.

[1]: Yes, the opposite of the problem people here usually have with it, which is things being closed as duplicates. I think more duplicates should be deleted and redirected to a canonical answer, which is then a focus of improvement. Too often google searches give me barely answered or unanswered duplicates and I have to click around in the site to find the result Google clearly should have given me in the first place (better keyword matches, not closed, higher score, etc). I think StackOverflow do this intentionally so people have to click on more pages and see more ads.

a2128 5 months ago

I think about this a lot. AI in the current state is like working with an intern who is on a stranded island with no internet access or compiler, they have to write down all of the code in forward sequence on piece of paper, god help them if they have to write any UI while also being blind. None of the "build an app with AI start-to-finish" products work well at all because of this.

sky2224 5 months ago

AI models are trained on the data from the internet, so sure, they couldn't do their search feature to scour the internet, but I doubt the material is much different than what the models were already trained on.

Additionally, before the age of stackoverflow and google, SWEs cracked open the book or documentation for whatever technology they were using.

achierius 5 months ago

Isn't this how interviews tend to work? So I think a good number of devs would, yes.

pzo 5 months ago

Interviews like leetcode on whiteboard only testing your reasoning not if your solution will execute out of the box in zero shot style. Humans solve problem in iterative way that's why fast feedback loop and access to tools is essential. When you start coding compiler or linter hints you that you forgot to close some braces or miss semicolon. Compiler tips you that API in new version changed, intellisense hints you what methods you can use in current context and what parameters you can use and their types. Once you execute program you get runtimes tips that maybe you missed installing some node or python package. When you installing packages you get hints that maybe one package has additional dependency and 2 package version are not compatible. Command line tools like `ls` tells you what's project structure etc.

kristopolous 5 months ago

I used to be able to. The web's made me lazy. I was better before it and I'm better when I don't use it.

Really, the stuff you think helps you is often just holding you back.

JohnKemeny 5 months ago

As one who organises competitive programming contests on a regular basis for university students, I would say almost every single one.

unification_fan 5 months ago

The experienced ones can

inykt 5 months ago

These models are held at higher standards than humans. They should be above to solve any coding problem with just the documentation.

Ekaros 5 months ago

Isn't point of the training that they already have all the information they could have. So they do not need the Internet as on Internet there would only be information they already "know"...