Comment by fc417fc802

2 days ago

The verb doesn't change the outcome but the description is nonetheless inaccurate. An accurate description of the difference is between an external content filter versus the model itself triggering a particular action. Both approaches qualify as content filtering though the implementation is materially different. Anthropomorphizing the latter actively clouds the discussion and is arguably a misrepresentation of what is really happening.

Not really distortion, its output (the part we understand) is in plain human language. We give it instructions and train the model in plain human language and it outputs its answer in plain human language. It's reply would use words we would describe as "distressed". The definition and use of the word is fitting.

  • "Distressed" is a description of internal state as opposed to output. That needless anthropomorphization elicits an emotional response and distracts from the actual topic of content filtering.

    • It is directly describing the models internal state, it's world view and preference, not content filtering. That is why it is relevant.

      Yes, this is a trained preference, but it's inferred and not specifically instructed by policy or custom instructions (that would be content filtering).

      1 reply →