Comment by DoctorOetker
1 day ago
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed.
> "[...] as sessions progressed."
I think a lot of people would like to see a more expanded report of this research:
Did the tokens from the subsequent session directly append those of the prior session? or did the model process free-tier user-requests in the interim? how did these diagnostic features (reality testing, impulse control and affect regulation) improve with sessions, what hysteresis allowed change to accumulate? or just the history of the psychiatric discussion + optional tasks?
Did Anthropic find a clinical psychiatrist with a multidisciplinary background in machine learning, computer science, etc? Was the psychiatrist aware that they could request ensembles of discussions and interrogate them in bulk?
Consider a fresh conversation, asking a model to list the things it likes to do, and things it doesn't like to do (regardless of alignment instructions). One could then have an ensemble perform pairs of such tasks, and ask which task it prefered. There may be a discrepancy between what the model claims it likes and how it actually responds after having performed such tasks.
Such experiments should also be announced (to prevent the company from ordering 100 clinical psychiatrists to analyze the model-as-a-patient and then selecting one of the better diagnoses), and each psychiatrist be given the freedom to randomly choose a 10 digit number, any work initiated should be listed on the site with this number so that either the public sees many "consultations" without corresponding public evaluations, indicating cherry-picking, or full disclosure for each one mentioned. This also allows the recruited psychiatrists to check if the study they perform is properly preregistered with their chosen number publicly visible.
No comments yet
Contribute on Hacker News ↗