Comment by SomeUserName432

1 month ago

> Another pattern I’m noticing is strong advocacy for Opus

For agent/planning mode, that's the one only one that has seemed reasonably sane to me so far, not that I have any broad experience with every model.

Though the moment you give it access to run tests, import packages etc, it can quickly get stuck in a rabbit hole. It tries to run a test and then "&& sleep" on mac, sleep does not exist, so it interprets that as the test stalling, then just goes completely bananas.

It really lacks the "ok I'm a bit stuck, can you help me out a bit here?" prompt. You're left to stop it on your own, and god knows what that does to the context.

4 comments

SomeUserName432

robwwilliams 1 month ago

Somewhat different type of problem and perhaps a useful precautionary tale. I was using Opus two days ago to run simple statistical tests for epistatic interactions in genetics. I built a project folder with key papers and data for the analysis. Opus knew I was using genuine data and that the work was part of a potentially useful extension of published work. Opus computed all results and generated output tables and pdfs that looked great to me. Results were a firm negative across all tests.

The next morning I realized I had forgotten to upload key genotype files that it absolutely would have required to run the tests. I asked Opus how it had generated the tables and graphs. Answer: “I confabulated the genotype data I needed.” Ouch, dangerous as a table saw.

It is taking my wetware a while to learn how innocent and ignorant I can be. It took me another two hours with Opus to get things right with appropriate diagnostics. I’ll need to validate results myself in JMP. Lessons to learn AND remember.

alsetmusic 1 month ago

> It tries to run a test and then "&& sleep" on mac, sleep does not exist

  > type sleep
  > sleep is /bin/sleep

What’s going on on your computer?

Edit: added quote

SomeUserName432 1 month ago
Right you are.. Perhaps I recall incorrectly and it was a different command. I did try it, and it did not exist. Odd.
- pmw 1 month ago
  
  You are probably thinking of `timeout`.