← Back to context

Comment by arjie

19 hours ago

Useful update. Would be useful to me to switch to a nightly / release cycle but I can see why they don't: they want to be able to move fast and it's not like I'm going to churn over these errors. I can only imagine that the benchmark runs are prohibitively expensive or slow or not using their standard harness because that would be a good smoke test on a weekly cadence. At the least, they'd know the trade-offs they're making.

Many of these things have bitten me too. Firing off a request that is slow because it's kicked out of cache and having zero cache hits (causes everything to be way more expensive) so it makes sense they would do this. I tried skipping tool calls and thinking as well and it made the agent much stupider. These all seem like natural things to try. Pity.