Comment by themanmaran
2 days ago
Anthropic regularly publishes research papers on the subject and details different methods they use to prevent misalignment/jailbreaks/etc. And it's not even about fear of being sued, but needing to deliver some level of resilience and stability for real enterprise use cases. I think there's a pretty clear profit incentive for safer models.
https://arxiv.org/abs/2501.18837
https://arxiv.org/abs/2412.14093
https://transformer-circuits.pub/2025/introspection/index.ht...
Alternative take: this is all marketing. If you pretend really hard that you're worried about safety, it makes what you're selling seem more powerful.
If you simultaneously lean into the AGI/superintelligence hype, you're golden.
Anthropic is investing, conservatively, $100+ billion in AI infrastructure and development. A 20-person research team could put out several papers a year. That would cost them what, $5 million a year, or one half of one percent? They don't have to spend much to get that kind of output.
Not to be cynical about it BUT a few safety papers a year with proper support is totally within the capabilities of a single PhD student and it costs about 100-150k to fund them through a university. Not saying that’s what Anthropocene does, I’m just saying chump change for those companies.
Sometimes I think people misunderstand how hard of problem AI safety actually is. It's politics and mathematics wrapped up in a black box of interactions we barely understand.
More so we train them on human behavior and humans have a lot of rather unstable behaviors.
You are very off (unfortunately) about how little PhD students are being paid
Figure cited is what the company gets charged, not what the student gets. I’m fairly familiar with what gets thrown at students :(
> You are very off (unfortunately) about how little PhD students are being paid
All in costs for a PhD student include university overheads & tuition fees. The total probably doesn't hit $150k but is 2-3x the stipend that the student is receiving.
Someone currently working in academia might have current figures to hand.
2 replies →