Comment by eterevsky

19 days ago

The plan is to improve AI agents from their current ~intern level to a level of a good engineer.

40 comments

eterevsky

They are not intern level.

Even if it could perform at a similar level to an intern at a programming task, it lacks a great deal of the other attributes that a human brings to the table, including how they integrate into a team of other agents (human or otherwise). I won't bother listing them, as we are all humans.

I think the hype is missing the forest for the trees, and I think exactly this multi-agent dynamic might be where the trees start to fall down in front of us. That and the as currently insurmountable issues of context and coherence over long time horizons.

Tade0 18 days ago

My impression is that Copilot acts a lot like one of my former coworkers, who struggled with:
-Being a parent to a small child and the associated sleep deprivation.
-His reluctance to read documentation.
-There being a language barrier between him the project owners. Emphasis here, as the LLM acts like someone who speaks through a particularly good translation service, but otherwise doesn't understand the language spoken.
Workaccount2 18 days ago
The real missing the forest for the trees is thinking that software and the way users will use computers is going to remain static.
Software today is written to accommodate every possible need of every possible user, and then a bunch of unneeded selling point features on top of that. These massive sprawling code bases made to deliver one-size fits all utility.
I don't need 3 million LOC Excel 365 to keep track of who is working on the floor on what day this week. Gemini 2.5 can write an applet that does that perfectly in 10 minutes.
- BugheadTorpeda6 18 days ago
  
  I don't know. I guess it depends on what you classify as being change. I don't really view software as having changed all that much since around maybe the mid 70s as HLLs began to become more popular. What programmers do today and what they did back then would be easily recognizable to both groups if we had time machines. I don't see how AI really changes things all that much. It's got the same scalability issues that low code/no code solutions have always had and those go way back. The main difference is that you can use natural language, but I don't see that as being inherently better than say drawing a picture using some flowcharting tools in a low code platform. You just introduce the same problem natural languages always have had and why we didn't choose them in the first place, i.e. they are not strict enough and need lots of context. Giving an AI very specific sentences to define my project in natural language and making sure it has lots of context begins to look an awful lot like psuedocode to me. So as you learn to approach using AI in such a way that it produces what you want you naturally get closer and closer to just specifying the code.
  What HAS indisputably changed is the cost of hardware which has driven accessibility and caused more consumer facing software to be made.
- ehnto 18 days ago
  
  I don't believe it will remain static, in fact it's done nothing but change every year for my entire career.
  I do like the idea of smaller programs fitting smaller needs being easy to access for everyone, and in my post history you would see me advocate for bringing software wages down so that even small businesses can have software capabilities in house. Software has so much to give to society outside of big VC flips and tech monoliths. Maybe AI is how we get there in the end.
  But I think that supplanting humans with an AI workforce in the very near future might be stretching the projection of its capabilities too far. LLMs will be augmenting how businesses operate from now and into the future, but I am seeing clear roadblocks that make an autonomous AI agent unviable, and it seems to be fundamental limitations of LLMs, eg continuity and context. Advances recently seem to be from supplemental systems that try to patch those limitations. That suggests those limits are tricky, and until a new approach shows up, that is what drives my lack of faith in an AI agent revolution.
  But it is clear to me that I could be wrong, and it could be a spectacular miscalculation. Maybe the robots will make me eat my hat.

ethanol-brain 19 days ago

Seems like that is taking a very long time, on top of some very grandiose promises being delivered today.

infecto 18 days ago
I look back over the past 2-3 years and am pretty amazed with how quick change and progress have been made. The promises are indeed large but the speed of progress has been fast. Not defending the promise but “taking a very long time” does not seem to be an accurate representation.
- zeroonetwothree 18 days ago
  
  I feel like we've made barely any progress. It's still good at the things Chat GPT was originally good at, and bad at the things it was bad at. There's some small incremental refinement but it doesn't really represent a qualitative jump like Chat GPT was originally. I don't see AI replacing actual humans without another step jump like that.
  
  1 reply →
- owebmaster 18 days ago
  
  > The promises are indeed large but the speed of progress has been fast
  And at the same time, absurdly slow? ChatGPT is almost 3 years old and pretty much AI has still no positive economic impact.
  
  12 replies →
- ethanol-brain 18 days ago
  
  I guess it probably depends on what you are doing. Outside of layers on top of these things (tooling), I personally haven't seen much progress.
  
  6 replies →
- bakugo 18 days ago
  
  > I look back over the past 2-3 years and am pretty amazed with how quick change and progress have been made.
  Now look at the past year specifically, and only at the models themselves, and you'll quickly realize that there's been very little real progress recently. Claude 3.5 Sonnet was released 11 months ago and the current SOTA models are only marginally better in terms of pure performance in real world tasks.
  The tooling around them has clearly improved a lot, and neat tricks such as reasoning have been introduced to help models tackle more complex problems, but the underlying transformer architecture is already being pushed to its limits and it shows.
  Unless some new revolutionary architecture shows up out of nowhere and sets a new standard, I firmly believe that we'll be stuck at the current junior level for a while, regardless of how much Altman & co. insist that AGI is just two more weeks away.
DrillShopper 18 days ago
Third AI Winter from overpromise/underdeliver when?
- rsynnott 18 days ago
  
  Third? It’ll be the tenth or so.

interimlojd 18 days ago

You are really underselling interns. They learn from a single correction, sometimes even without a correction, all by themselves. Their ability to integrate previous experience in the context of new problems is far, far above what I've ever seen in LLMs

mnky9800n 18 days ago

Yes but they are supposed to be PhD level 5 years ago if you are listening to sama et al.

rchaud 18 days ago

Especially ironic considering he's neither a developer nor a PhD. He's the smooth talking "MBA idea guy looking for a technical cofounder" type that's frequently decried on HN.

einsteinx2 18 days ago

Without handholding (aka being used as a tool by a competent programmer instead of as an independent “agent”), they’re currently significantly worse than an intern.

serial_dev 18 days ago

This looks much worse than an intern. This feels like a good engineer who has brain damage.

When you look at it from afar, it looks potentially good, but as you start looking into it for real, you start realizing none of it makes any sense. Then you make simple suggestions, it does something that looks like what you asked, yet completely missing the point.

An intern, no matter how bad it is, could only waste so much time and energy.

This makes wasting time and introducing mind-bogglingly stupid bugs infinitely scalable.

marmakoide 18 days ago

The plan went from the AI being a force multiplier, to a resource hungry beast that have to be fed in the hope it's good enough to justify its hunger.

rsynnott 18 days ago

I mean, I think this is a _lot_ worse than an intern. An intern isn't constantly going to make PRs with failing CI, for a start.

cyanydeez 18 days ago

I plan to be a billionaire