Comment by grej

6 days ago

Related to this, is anyone aware whether there is a benchmark on this kind of thing - maybe broadly the category of “context rot”? To track things that are not germane to the current question adversely affecting the responses, as well as the volume of germane but deep context creating the inability of models to follow the conversation? I’ve definitely experienced the latter with coding models.

2 comments

grej

energy123 6 days ago

In computer vision they add noise to the picture when training. Maybe LLM providers should do the same during RL.

nijave 6 days ago

Not sure but sounds like a very similar problem to prompt injection