Comment by ryandv 6 months ago [flagged] 2 comments ryandv Reply lupusreal 6 months ago What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time. porridgeraisin 6 months ago No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.
lupusreal 6 months ago What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.
porridgeraisin 6 months ago No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.
What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.
No need benchmarks. We already know they can BS better than anyone for 3 hours, make statistical method errors, and hallucinate studies.