Comment by davidatbu

15 hours ago

Ah, I just learnt that you don't. Jarred's comment saying exactly that: https://news.ycombinator.com/item?id=48133806

4 comments

davidatbu

I'll actually concede that, on a slower skim, some changes to the test suite and fixtures that first seemed suspicious to me indeed align with what those tests were doing previously, and I wish I could retract that comment.

I still think it's not such an impressive test suite as it's being claimed; which, if this actually works out, should say more about Claude's skill than the people driving it.

davidatbu 7 hours ago
Gotcha. I'm genuinely curious: by "impressive", are you referring to coverage? I'd be grateful if you could say a few words about it could be more impressive (e.g, if you indeed meant to talk about coverage, say what functionality/edge cases aren't covered as of now)
- debugnik 4 hours ago
  
  Our programming languages are bad at specification and verification, so the next best thing is property-testing for modeling (e.g. Hypothesis for Python) or, for the reference implementations, extensive "expect"/snapshot test cases (e.g. Cram).
  Instead, I found the bog standard suite with a single case per regression and very few actual modeling, although I wasn't expecting more. (I don't care much for JS, let alone Bun, so I can't point to features I'd like to see better tested, but I'm sure the issue tracker can do that job already.)
  To be fair, our whole industry is really bad at this; most test suites are verification theatre, but now that machines can fill out implementations on their own, we should strive to properly model our requirements and limits so they can one shot what we intended. Otherwise we're left in an awkward middle in which we don't add much value over the AI fumbling around.
  
  1 reply →