Comment by fastneutron

2 years ago

100% this. I’ve been party to RLHF jobs before and the instructions nearly always state to prioritize conciseness in the model response.

In aggregate, this is how you wind up with stub functions and narrative descriptions rather than full working implementations. The RLHF is optimizing for correctness within some constrained token count.