← Back to context

Comment by lmeyerov

3 days ago

As someone has been doing hardcore genai for 2+ years, my experience has been, and what we advise internally:

* 3 weeks to transition from ai pairing to AI Delegation to ai multitasking. So work gains are mostly week 3+. That's 120+ hours in, as someone pretty senior here.

* Speedup is the wrong metric. Think throughput, not latency. Some finite amount of work might take longer, but the volume of work should go up because AI can do more on a task and diff tasks/projects in parallel.

Both perspectives seem consistent with the paper description...

Have you actually measured this?

Because one of the big takeaways from this study is that people are bad at predicting and observing their own time spent.

  • yes, I keep prompt plan logs

    At the same time... that's not why I'm comfortable writing this. It's pretty obvious when you know what good vs bad feels like here and adjust accordingly:

    1. Good: You are able to generate a long plan and that plan mostly works. These are big wins _as long as you are multitasking_: you are high throughput, even if the AI is slow. Think running 5-20min at a time for pretty good progress, for just a few minutes of your planning that you'd largely have to do anyways.

    2. Bad: You are wasting a lot of attention chatting (so 1-2min runs) and repairing (re-planning from the top, vs progressing). There is no multitasking win.

    It's pretty clear what situation you're in, with run duration on its own being a ~10X level difference.

    Ex: I'll have ~3 projects going at the same time, and/or whatever else I'm doing. I'm not interacting "much" so I know it's a win. If a project is requiring interaction, well, now I need to jump in, and it's no longer agentic coding IMO, but chat assistant stuff.

    At the same time, I power through case #2 in practice because we're investing in AI automation. We're retooling everything to enable long runs, so we'll still do the "hard" tasks via AI to identify & smooth the bumps. Similar to infrastructure-as-code and SDLC tooling, we're investing in automating as much of our stack as we can, so that means we figure out prompt templates, CI tooling, etc to enable the AI to do these so we can benefit later.

    • Oh, that's not quite what I was asking about -- I was wondering if you've compared AI vs no-AI for tasks, and kept measurements of that. But it sounds like you're not in a position to do so.

      1 reply →