Comment by themanmaran

3 days ago

Anthropic regularly publishes research papers on the subject and details different methods they use to prevent misalignment/jailbreaks/etc. And it's not even about fear of being sued, but needing to deliver some level of resilience and stability for real enterprise use cases. I think there's a pretty clear profit incentive for safer models.

https://arxiv.org/abs/2501.18837

https://arxiv.org/abs/2412.14093

https://transformer-circuits.pub/2025/introspection/index.ht...

Alternative take: this is all marketing. If you pretend really hard that you're worried about safety, it makes what you're selling seem more powerful.

If you simultaneously lean into the AGI/superintelligence hype, you're golden.

Anthropic is investing, conservatively, $100+ billion in AI infrastructure and development. A 20-person research team could put out several papers a year. That would cost them what, $5 million a year, or one half of one percent? They don't have to spend much to get that kind of output.

Not to be cynical about it BUT a few safety papers a year with proper support is totally within the capabilities of a single PhD student and it costs about 100-150k to fund them through a university. Not saying that’s what Anthropocene does, I’m just saying chump change for those companies.

  • Sometimes I think people misunderstand how hard of problem AI safety actually is. It's politics and mathematics wrapped up in a black box of interactions we barely understand.

    More so we train them on human behavior and humans have a lot of rather unstable behaviors.

  • You are very off (unfortunately) about how little PhD students are being paid

    • > You are very off (unfortunately) about how little PhD students are being paid

      All in costs for a PhD student include university overheads & tuition fees. The total probably doesn't hit $150k but is 2-3x the stipend that the student is receiving.

      Someone currently working in academia might have current figures to hand.

      2 replies →

    • Figure cited is what the company gets charged, not what the student gets. I’m fairly familiar with what gets thrown at students :(