Comment by politician

1 year ago

The ambient bias in the training data is not a concern. The directional bias that can be inflicted during the RLHF step consumes most of my concern.

How? Simply by putting the right types of people onto the task! Don’t you know that the human participants in RLHF processes are screened? Will the feedback provided by homogeneous collections of Ultra MAGA Trumpers, Woke Zealots, or WEF Sycophants result in an unbiased model? The same model?

Do we know who provided feedback to Gemini? Do we know what they were told, promised, or paid?