Comment by ActivePattern

1 year ago

Just want to note that this simple “mimicry” of mistakes seen in the training text can be mitigated to some degree by reinforcement learning (e.g. RLHF), such that the LLM is tuned toward giving responses that are “good” (helpful, honest, harmless, etc…) according to some reward function.

0 comments