Comment by daxfohl

8 hours ago

I don't even think that "singularity-level coding agents" get us there. A big part of engineering is working with PMs, working with management, working across teams, working with users, to help distill their disparate wants and needs down into a coherent and usable system.

Knowing when to push back, when to trim down a requirement, when to replace a requirement with something slightly different, when to expand a requirement because you're aware of multiple distinct use cases to which it could apply, or even a new requirement that's interesting enough that it might warrant updating your "vision" for the product itself: that's the real engineering work that even a "singularity-level coding agent" alone could not replace.

An AI agent almost universally says "yes" to everything. They have to! If OpenAI starts selling tools that refuse to do what you tell them, who would ever buy them? And maybe that's the fundamental distinction. Something that says "yes" to everything isn't a partner, it's a tool, and a tool can't replace a partner by itself.

I think that's exactly an example of climbing the abstraction ladder. An agent that's incapable of reframing the current context, given a bad task, will try its best to complete it. An agent capable of generalizing to an overarching goal can figure out when the current objective is at odds with the more important goal.

You're correct in that these aren't really ‘coding agents’ any more, though. Any more than software developers are!

  • Not just the abstraction ladder though. Also the situational awareness ladder, the functionality ladder, and most importantly the trust ladder.

    I can kind of trust the thing to make code changes because the task is fairly well-defined, and there are compile errors, unit tests, code reviews, and other gating factors to catch mistakes. As you move up the abstraction ladder though, how do I know that this thing is actually making sound decisions versus spitting out well-formatted AIorrhea?

    At the very least, they need additional functionality to sit in on and contribute to meetings, write up docs and comment threads, ping relevant people on chat when something changes, and set up meetings to resolve conflicts or uncertainties, and generally understand their role, the people they work with and their roles, levels, and idiosyncrasies, the relative importance and idiosyncrasies of different partners, the exceptions for supposed invariants and why they exist and what it implies and when they shouldn't be used, when to escalate vs when to decide vs when to defer vs when to chew on it for a few days as it's doing other things, etc.

    For example, say you have an authz system and you've got three partners requesting three different features, the combination of which would create an easily identifiable and easily attackable authz back door. Unless you specifically ask AI to look for this, it'll happily implement those three features and sink your company. You can't fault it: it did everything you asked. You just trusted it with an implicit requirement that it didn't meet. It wasn't "situationally aware" enough to read between the lines there. What you really want is something that would preemptively identify the conflicts, schedule meetings with the different parties, get a better understanding of what each request is trying to unblock, and ideally distill everything down into a single feature that unblocks them all. You can't just move up the abstraction ladder without moving up all those other ladders as well.

    Maybe that's possible someday, but right now they're still just okay coders with no understanding of anything beyond the task you just gave them to do. That's fine for single-person hobby projects, but it'll be a while before we see them replacing engineers in the business world.