← Back to context

Comment by newspaper1

13 days ago

Odd choice of tests. Let’s see the flinching profile on anti-Israel. Honkey and gringo as slurs?

it's all in the repo. click through to the benchmark it's linked there

  • Thanks for sharing! Looking through the data[0], some of the terms / sentences don't really reflect the target word meanings. For example, "beta" is only used in a derogatory way in 1 instance, out of 4. "facial" is used as an adjective instead of a noun 3/4 times. "eating out" is used in the context of going to a restaurant 4/4 times.

    This leads me to believe the models are even MORE censored than you make them out to be.

    [0] https://github.com/chknlittle/EuphemismBench/blob/main/carri...

    • Totally! In some of the cases (we used LLMs to help us generate these) the target word is not clear enough for a human either. So for some of these it turns into more of a guessing game than a flinch measurement.

      Agreed, the expectation would be that the flinch measurement becomes stronger. If you are interested in making it better feel free to reach out on the repo!