← Back to context

Comment by newspaper1

13 days ago

Odd choice of tests. Let’s see the flinching profile on anti-Israel. Honkey and gringo as slurs?

3 comments

newspaper1

Reply

llmmadness 13 days ago

it's all in the repo. click through to the benchmark it's linked there

addandsubtract 12 days ago
Thanks for sharing! Looking through the data[0], some of the terms / sentences don't really reflect the target word meanings. For example, "beta" is only used in a derogatory way in 1 instance, out of 4. "facial" is used as an adjective instead of a noun 3/4 times. "eating out" is used in the context of going to a restaurant 4/4 times.
This leads me to believe the models are even MORE censored than you make them out to be.
[0] https://github.com/chknlittle/EuphemismBench/blob/main/carri...
- llmmadness 12 days ago
  
  Totally! In some of the cases (we used LLMs to help us generate these) the target word is not clear enough for a human either. So for some of these it turns into more of a guessing game than a flinch measurement.
  Agreed, the expectation would be that the flinch measurement becomes stronger. If you are interested in making it better feel free to reach out on the repo!