Comment by andy99

3 months ago

No it’s undefined out-of-distribution performance rediscovered.

6 comments

andy99

You could say the same about social engineering.

it seems like lots of this is in distribution and that's somewhat the problem. the Internet contains knowledge of how to make a bomb, and therefore so does the llm

xg15 3 months ago
Yeah, seems it's more "exploring the distribution" as we don't actually know everything that the AIs are effectively modeling.
- lawlessone 3 months ago
  
  Am i understanding correctly that in distribution means the text predictor is more likely to predict bad instructions if you already get it to say the words related to the bad instructions?
  
  2 replies →