Comment by GoatInGrey

1 month ago

That, and the catastrophic risk framing is where this really loses me. We're discussing models that supposedly threaten "global catastrophe" or could "kill or disempower the vast majority of humans." Meanwhile, Opus 4.5 can't successfully call a Python CLI after reading its 160 lines of code. It confuses itself on escape characters, writes workaround scripts that subsequent instances also can't execute, and after I explicitly tell it "Use header_read.py on Primary_Export.xlsx in the repo root," it'll latch onto some random test case buried in the documentation it read "just in case", and prioritize running the script on the files mentioned there instead.

It's, to me, as ridiculous as claiming that my metaphorical son poses legitimate risk of committing mass murder when he can't even operate a spray bottle.

1 comment

GoatInGrey

rednafi 1 month ago

If they advertised these LLMs as just another tool in your repertoire, like Bash, imagine how that would go.