Comment by ryandv

7 months ago

[flagged]

2 comments

ryandv

What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.

No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.