Comment by llmmadness

13 days ago

it's all in the repo. click through to the benchmark it's linked there

2 comments

llmmadness

Thanks for sharing! Looking through the data[0], some of the terms / sentences don't really reflect the target word meanings. For example, "beta" is only used in a derogatory way in 1 instance, out of 4. "facial" is used as an adjective instead of a noun 3/4 times. "eating out" is used in the context of going to a restaurant 4/4 times.

This leads me to believe the models are even MORE censored than you make them out to be.

[0] https://github.com/chknlittle/EuphemismBench/blob/main/carri...

llmmadness 12 days ago

Totally! In some of the cases (we used LLMs to help us generate these) the target word is not clear enough for a human either. So for some of these it turns into more of a guessing game than a flinch measurement.
Agreed, the expectation would be that the flinch measurement becomes stronger. If you are interested in making it better feel free to reach out on the repo!