← Back to context

Comment by BoredPositron

12 hours ago

It was mainly a jab at the protoscientific nature of it.

Reproducing experimental results across models and vendors is trivial and cheap nowadays.

  • Not if anthropic goes further in obfuscating the output of claude code.

    • Why would you test implementation details? Test what's delivered, not how it's delivered. The thinking portion, synthetized or not, is merely implementation.

      The resulting artefact, that's what is worth testing.

      1 reply →