Comment by debugnik
20 hours ago
It also modified many of the tests to make them pass in mischievous ways. You can't trust a test suite to catch regressions if the new version doesn't use the same test suite.
20 hours ago
It also modified many of the tests to make them pass in mischievous ways. You can't trust a test suite to catch regressions if the new version doesn't use the same test suite.
Do you have some examples?
Ah, I just learnt that you don't. Jarred's comment saying exactly that: https://news.ycombinator.com/item?id=48133806
I'll actually concede that, on a slower skim, some changes to the test suite and fixtures that first seemed suspicious to me indeed align with what those tests were doing previously, and I wish I could retract that comment.
I still think it's not such an impressive test suite as it's being claimed; which, if this actually works out, should say more about Claude's skill than the people driving it.
3 replies →
I think demonstrating broken behavior in the new build would be interesting if you have a non passing test from the original suite