← Back to context Comment by ryandv 7 days ago [flagged] 2 comments ryandv Reply lupusreal 7 days ago What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time. porridgeraisin 7 days ago No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.
lupusreal 7 days ago What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.
porridgeraisin 7 days ago No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.
What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.
No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.