Comment by ssivark
3 months ago
To elaborate — the task definition itself is vague enough that any evaluation will necessarily be vibes based. There is fundamentally no precise definition of correctness/reliability.
3 months ago
To elaborate — the task definition itself is vague enough that any evaluation will necessarily be vibes based. There is fundamentally no precise definition of correctness/reliability.
No comments yet
Contribute on Hacker News ↗