Comment by css_apologist

3 days ago

this is news to me, how does this work? who is getting paid?

Some relevant job ads for Anthropic:

https://www.anthropic.com/careers/jobs/5025624008 - "Research Engineer – Cybersecurity RL" - "This role blends research and engineering, requiring you to both develop novel approaches and realize them in code. Your work will include designing and implementing RL environments, conducting experiments and evaluations, delivering your work into production training runs, and collaborating with other researchers, engineers, and cybersecurity specialists across and outside Anthropic."

https://www.anthropic.com/careers/jobs/4924308008 - "Research Engineer / Research Scientist, Biology & Life Sciences" - "As a founding member of our team, you'll work at the intersection of cutting-edge AI and the biological sciences, developing rigorous methods to measure and improve model performance on complex scientific tasks."

The key trend in 2025 was a new emphasis on reinforcement learning - models are no longer just trained by dumping in a ton of scraped text, there's now a TON of work involved designing reinforcement learning loops that teach them how to do specific useful things - and designing those loops requires subject-matter expertise.

That's why they got so much better at code over the past six months - code is the perfect target for RL because you can run generated code and see if it works or not.