Comment by bear141

3 months ago

I thought this would be inherent just on their training? There are many multitudes more Reddit posts than scientific papers or encyclopedia type sources. Although I suppose the latter have their own biases as well.

4 comments

bear141

docmars 3 months ago

I'd expect LLMs' biases to originate from the companies' system prompts rather than the volume of training data that happens to align with those biases.

mrbombastic 3 months ago
I would expect the opposite. Seems unlikely to me an ai company would be spending much time engineering system prompts that way except in the case of maybe Grok where Elon has a bone to pick with perceived bias.
- docmars 3 months ago
  
  If you ask a mainstream LLM to repeat a slur back to you, it will refuse to. This was determined by the AI company, not the content it was trained on. This should be incredibly obvious — and this extends to many other issues.
  In fact, OpenAI has made deliberate changes to ChatGPT more recently that helps prevent people from finding themselves in negative spirals over mental health concerns, which many would agree is a good thing. [1]
  Companies typically have community guidelines that often align politically in many ways, so it stands to reason AI companies are spending a fair bit of time tailoring AI responses according to their biases as well.
  1. https://openai.com/index/strengthening-chatgpt-responses-in-...
  
  1 reply →