Comment by measurablefunc

1 month ago

This is very cool. I looked at the Claude.md he was generating and it is basically all of Claude's failure modes in one file. I can think of a few reasons why Anthropic would not want this information out in the open or for someone to systematically collate all the data into one file.

3 comments

measurablefunc

genewitch 1 month ago

i read the related parts of the linked file in the repo, and it took me a while to find your comment here again to reply to. Are you saying that the failure modes of claude with "coding" webapps or whatever OP was doing? i originally thought it might have meant like... jailbreak. But having read it, i assume you meant the former, as we both read the same thing and it seemed like a series of admonitions to the LLM, written by the LLM (with some spice added by OP? like "YOU ARE WRONG") and i couldn't find anything that would warrant a ban, you know?

measurablefunc 1 month ago
I'm not saying he did anything wrong. I'm saying I can see how Anthropic's automated systems might have flagged & banned the account b/c one of the heuristics they probably use is that there should be no short feedback loops where outputs of Claude are fed back into inputs. So basically Anthropic tracks all calls to their API & they have some heuristics for going through the history & then assigning scores based on what they think is "abusive" or "loopy".
Of course none of it is actually written anywhere so this guy just tripped the heuristics even though he wasn't doing anything "abusive" in any meaningful sense of the word.
- genewitch 1 month ago
  
  thank you for explaining, your point makes sense and i tend to agree with the surmise.