Comment by JeremyNT

1 year ago

I think it's pretty clear that they're trying to prevent one class of issues (the model spitting out racist stuff in one context) and have introduced another (the model spitting out wildly inaccurate portrayals of people in historical contexts). But thousands of end users are going to both ask for and notice things that your testers don't, and that's how you end up here. "This system prompt prevents Gemini from promoting Naziism successfully, ship it!"

This is always going to be a challenge with trying to moderate or put any guardrails on these things. Their behavior is so complex it's almost impossible to reason about all of the consequences, so the only way to "know" is for users to just keep poking at it.

0 comments

JeremyNT

No comments yet

Contribute on Hacker News ↗