Comment by ainch
1 day ago
I'd spent 6 hours solving a gnarly RL problem (mathematically solving divergence of off-policy TD-Lambda for any value of lambda or behaviour policy).
As a punt I gave it to o3 (remember LLMs were 'bad at maths') - after 15 minutes it returned with the answer that had taken me hours.
No comments yet
Contribute on Hacker News ↗