Comment by concinds

7 hours ago

These models demonstrably have good vulnerability research capabilities.

I'm sure their marketing department is ecstatic but you guys are far more hype-based than what you're calling out.

8 comments

concinds

Good but not necessarily better that was is already pay-as-you-go available today. ref. https://www.flyingpenguin.com/the-boy-that-cried-mythos-veri...

This AISLE benchmark is interesting in this matter: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...

And the recently discovered Copy Fail by Xint code is another proof that the gating is overblown: https://xint.io/blog/copy-fail-linux-distributions

aesthesia 2 hours ago

Calling the AISLE experiment a "benchmark" is generous. They tested three code snippets on each model.

ZyanWu 7 hours ago

> demonstrably

I'm not entirely up to date on each week's LLM hype train/scandal but last I heard there was no public access to it or public-trusted 3rd parties that can review model's capabilities

concinds 3 hours ago

I don't think so
https://x.com/AISecurityInst/status/2049868227740565890
2ndorderthought 6 hours ago
You are up to date. Mythos had unauthorized access because of poor security but that's it as far as I know. Not exactly a good sign for something being advertised as a weapon...
- saghm 4 hours ago
  
  You'd think if Mythos was so good at finding security issues they could point it at their own setup for it and have found those issues easily...
SpicyLemonZest 6 hours ago
It’s easy to end up with no public-trusted third parties if we arbitrarily distrust third parties who say the capabilities match what’s promised. Mozilla for example says it found hundreds of Firefox vulnerabilities, and I think it’s pretty unlikely they’re lying to cover Anthropic’s back.
- calgoo 5 hours ago
  
  I think the question around the Firefox find, is not that they found hundreds of vulnerabilities - they found hundreds of bugs.
  What would be really interesting is a side by side Claude Opus 4.7 and Mythos comparison.