Comment by DaveMcMartin
21 hours ago
This only reinforces my bias against AI agents. At this point, they are mostly just hype. I believe that for AI to replace a junior, we would need to achieve at least near-AGI, and we are far from that.
21 hours ago
This only reinforces my bias against AI agents. At this point, they are mostly just hype. I believe that for AI to replace a junior, we would need to achieve at least near-AGI, and we are far from that.
If by hype you mean that there isn't extreme real world value right here and right now, then I very much disagree.
Closing in on 20 years since I left school and for me AI is absolutely useful, right here and right now. It is really a bicycle for the mind:
It allows me to get much faster to where I want. (And like bicycles you will get a few crashes early on and possibly later as well, depending on how fast you move and how careful you are.)
I might be in some sweet spot where I am both old enough to know what is going on without using an AI but also young enough to pick up the use of AI relatively effortlessly.
If however by hype you mean that people still have overhyped expactations about the near future, then yes, I agree more and more.
I feel AI can also do simple monotonous coding tasks, but I don't think programming is something it's currently very good at. Samples, yes, trivial programs, sure, but anything non-trivial and it's rarely useful.
Where it really shines today is getting humans up to speed with new technologies, things that are well understood in general but maybe not well understood by you.
Want to say build a window manager in X11, despite never having worked with X11 before? Sure, Claude will point you in the right direction and give you a simple template to work with in 30 seconds. Enormous time saver compared to figuring out how to do that from scratch.
Never touched node in your life but want to build a simple electron app? Sure, here's how you get started. Few hours and several follow up questions later, you're comfortable and productive in the environment.
Getting off the ground with new technologies is so much easier with AI it's kind of ridiculous. The revolutionary part of AI coding is how it makes it much easier to be a true generalist, capable of working in any environment with any technology, whatever is appropriate.
Exactly. LLMs are gullible. They will believe anything you tell them, including incorrect things they have told themselves. This amplifies errors greatly, because they don't have the capacity to step back and try a different approach, or introspect why they failed. They need actual guidance from somebody with much common sense; if let loose in the world, they mostly just spin around in circles because they don't have this executive intelligence.
A regular single-pass LLM indeed cannot step back, but newer ones like o1/o3/Marco-o1/QwQ can, and a larger agentic system composed of multiple LLMs definitely can. There is no "fundamental" limitation here. And once we start training these larger systems from the ground up via full reinforcement learning (rather than composing existing models), the sky's the limit. I'd be very bullish about Deepmind, once they fully enter this race.
> And once we start training these larger systems from the ground up via full reinforcement learning (rather than composing existing models),
Agree with this totally.
I wouldn't call what the CoT models are doing exactly being able to step back - their "stepping back" still dumps tokens into the output, so it is still burdened with seeing all of these failed attempts as it searches for the right one. But my intuition on this can be wrong, and it's a much more advanced reasoning process than what "last-gen" (non-CoT) does, so I can see your point.
For an agentic system composed of multiple LLMs, I would strongly disagree if the LLMs are last-gen. In my experience, it is very hard to prompt a non-CoT LLM into rejecting an upstream assumption without making it paranoid and rejecting valid assumptions as well. This makes it hard to effectively create a robust agentic system that can self-correct.
I think that's different if the agents are o1-level, but I think it's hard to appreciate just how costly and slow doing this would be. Agents consume tokens like candy with all the back-and-forth, so a surprising number of tasks become economically infeasible.
(It seems everyone is waiting for an inference perf breakthrough that may or may not come.)