← Back to context

Comment by dmbche

5 days ago

"The more revealing signal is in the tail. The longest turns tell us the most about the most ambitious uses of Claude Code, and point to where autonomy is heading. Between October 2025 and January 2026, the 99.9th percentile turn duration nearly doubled, from under 25 minutes to over 45 minutes (Figure 1)."

That's just straight up nonsense, no? How much cherry picking do you need?

What do you think is wrong about this? It matches my experience pretty well.

  • Short window, small and unrepresentative data pool, cherry picking for 0.1% longest turn time without turn time being demonstrated as a proxy for autonomy.

    Looks to me like fishing for some data that seems good.

    • Most tasks simply don't take that long.

      Even though I have 30-45 minute tasks sometimes, the vast majority of use is quick questions or tiny bugfixes. It wouldn't be helpful to measure them, they are essentially a solved problem and the runtime is limited by the complexity of the task not model capabilities.