Comment by ainch

1 day ago

I'd spent 6 hours solving a gnarly RL problem (mathematically solving divergence of off-policy TD-Lambda for any value of lambda or behaviour policy).

As a punt I gave it to o3 (remember LLMs were 'bad at maths') - after 15 minutes it returned with the answer that had taken me hours.