it seems like lots of this is in distribution and that's somewhat the problem. the Internet contains knowledge of how to make a bomb, and therefore so does the llm
Am i understanding correctly that in distribution means the text predictor is more likely to predict bad instructions if you already get it to say the words related to the bad instructions?
You could say the same about social engineering.
it seems like lots of this is in distribution and that's somewhat the problem. the Internet contains knowledge of how to make a bomb, and therefore so does the llm
Yeah, seems it's more "exploring the distribution" as we don't actually know everything that the AIs are effectively modeling.
Am i understanding correctly that in distribution means the text predictor is more likely to predict bad instructions if you already get it to say the words related to the bad instructions?
2 replies →