← Back to context

Comment by alphazard

11 hours ago

Every time I read something like this, it strikes me as an attempt to convince people that various people-management memes are still going to be relevant moving forward. Or even that they currently work when used on humans today. The reality is these roles don't even work in human organizations today. Classic "job_description == bottom_of_funnel_competency" fallacy.

If they make the LLMs more productive, it is probably explained by a less complicated phenomenon that has nothing to do with the names of the roles, or their descriptions. Adversarial techniques work well for ensuring quality, parallelism is obviously useful, important decisions should be made by stronger models, and using the weakest model for the job helps keep costs down.

My understanding is that the main reason splitting up work is effective is context management.

For instance, if an agent only has to be concerned with one task, its context can be massively reduced. Further, the next agent can just be told the outcome, it also has reduced context load, because it doesn't need to do the inner workings, just know what the result is.

For instance, a security testing agent just needs to review code against a set of security rules, and then list the problems. The next agent then just gets a list of problems to fix, without needing a full history of working it out.

  • Which, ultimately, is not such a big difference to the reason we split up work for humans, either. Human job specialization is just context management over the course of 30 years.

    • > Which, ultimately, is not such a big difference to the reason we split up work for humans,

      That's mostly for throughput, and context management.

      It's context management in that no human knows everything, but that's also throughput in a way because of how human learning works.

  • I’ve found that task isolation, rather than preserving your current session’s context budget, is where subagents shine.

    In other words, when I have a task that specifically should not have project context, then subagents are great. Claude will also summon these “swarms” for the same reason. For example, you can ask it to analyze a specific issue from multiple relevant POVs, and it will create multiple specialized agents.

    However, without fail, I’ve found that creating a subagent for a task that requires project context will result in worse outcomes than using “main CC”, because the sub simply doesn’t receive enough context.

  • So two things.. Yes this helps with context and is a primary reason to break out the sub-agents.

    However one of the bigger things is by having a focus on a specific task or a role, you force the LLM to "pay attention" to certain aspects. The models have finite attention and if you ask them to pay attention to "all things".. they just ignore some.

    The act of forcing the model to pay attention can be acoomplished in alternative ways (defined process, commitee formation in single prompt, etc.), but defining personas at the sub-agent is one of the most efficient ways to encode a world view and responsibilities, vs explicitly listing them.

I think it's just the opposite, as LLMs feed on human language. "You are a scrum master." Automatically encodes most of what the LLM needs to know. Trying to describe the same role in a prompt would be a lot more difficult.

Maybe a different separation of roles would be more efficient in theory, but an LLM understands "you are a scrum master" from the get go, while "you are a zhydgry bhnklorts" needs explanation.

  • This has been pretty comprehensively disproven:

    https://arxiv.org/abs/2311.10054

    Key findings:

    -Tested 162 personas across 6 types of interpersonal relationships and 8 domains of expertise, with 4 LLM families and 2,410 factual questions

    -Adding personas in system prompts does not improve model performance compared to the control setting where no persona is added

    -Automatically identifying the best persona is challenging, with predictions often performing no better than random selection

    -While adding a persona may lead to performance gains in certain settings, the effect of each persona can be largely random

    Fun piece of trivia - the paper was originally designed to prove the opposite result (that personas make LLMs better). They revised it when they saw the data completely disproved their original hypothesis.

    • Persona’s is not the same thing as a role. The point of the role is to limit what the work of the agent, and to focus it on one or two behaviors.

      What the paper is really addressing is does key words like you are a helpful assistant give better results.

      The paper is not addressing a role such as you are system designer, or you are security engineer which will produce completely different results and focus the results of the LLM.

    • In a discussion about LLMs you link to a paper from 2023, when not even GPT-4 was available?

      And then you say:

      > comprehensively disproven

      ? I don't think you understand the scientific method

      1 reply →

    • One study has “comprehensively disproven” something for you? You must be getting misled left right and centre if that’s how you absorb study results.

I suppose it’s could end up being an LLM variant of Conway’s Law.

“Organizations are constrained to produce designs which are copies of the communication structures of these organizations.”

https://en.wikipedia.org/wiki/Conway%27s_law

  • If so, one benefit is you can quickly and safely mix up your set of agents (a la Inverse Conway Manoeuvre) without the downsides that normally entails (people being forced to move teams or change how they work).

Developers do want managers actually, to simplify their daily lives. Otherwise they would self manage themselves better and keep more of the share of revenues for them

  • Unfortunately some managers get lonely and want a friendly face in their org meetings, or can’t answer any technical questions, or aren’t actually tracking what their team is doing. And so they pull in an engineer from their team.

    Being a manager is a hard job but the failure mode usually means an engineer is now doing something extra.