Comment by wslh
5 hours ago
I think it's worth to look at the recent XBOW benchmark: https://xbow.com/blog/mythos-offensive-security-xbow-evaluat... they realized that ChatGPT 5.5 works better so the secret is in the architecture (including humans in the loop).
'frontier tokens are not fungible'