← Back to context

Comment by BoredPositron

11 hours ago

It was mainly a jab at the protoscientific nature of it.

4 comments

BoredPositron

Reply

vntok 9 hours ago

Reproducing experimental results across models and vendors is trivial and cheap nowadays.

BoredPositron 9 hours ago
Not if anthropic goes further in obfuscating the output of claude code.
- vntok 8 hours ago
  
  Why would you test implementation details? Test what's delivered, not how it's delivered. The thinking portion, synthetized or not, is merely implementation.
  The resulting artefact, that's what is worth testing.
  
  1 reply →