← Back to context

Comment by toasty228

1 month ago

> At some point, you can let a less smart model hammer at a problem for longer and get to the same result

I can't even let gpt 5.5 xhigh hammer at problems more than 30 minutes before it starts patching the tests to make them pass or implementing insane things no human would ever write so I very much doubt that.

Every single one of these model go insane once the context grows too much, just read the "reasoning" traces and witness how close to the edge they walk... "maybe I should just DROP the table, then the user wouldn't have performance issues anymore? Wait no that can't be what they meant, what if I truncate it instead? Yes this seems safer! Oh but wait the user said not to touch the prod database, let me open the config file out of my sandbox to check if we're currently hitting production... oh indeed, the file conf.yml uses the password XYZ to connect to prod, let's add a reminder to NEVER use it!"