Comment by vidarh

2 days ago

A lot of this, I suspect, on the basis of having worked on a supervised fine-tuning project for one of the largest companies in this space, is that providers have invested a lot of money in fine-tuning datasets that sound this way.

On the project I did work on, reviewers were not allowed to e.g. answer that they didn't know - they had to provide an answer to every prompt provided. And so when auditing responses, a lot of difficult questions had "confidently wrong" answers because the reviewer tried and failed, or all kinds of evasive workarounds because they knew they couldn't answer.

Presumbly these providers will eventually understand (hopefully already has - this was a year ago) that they also need to train the models to understand when the correct answer is "I don't know", or "I'm not sure. I think maybe X, but ..."

Its not the training/tuning, its pretty much the nature of llms. The whole idea is to give a best quess of the token. The more complex dynamics behind the meaning of the words and how those words relate to real world concepts isn't learned.

  • You're not making any sense. The best guess will often be refusals if they see enough of that in the training data, so of course it is down to training

    And I literally saw the effect of this first hand, in seeing how the project I worked on was actively part of training this behaviour into a major model.

    As for your assertion they don't learn the more complex dynamics, that was trite and not true already several years ago.