← Back to context

Comment by paradite

8 hours ago

Ironically this is a goldmine for AI labs and AI writer startups to do RL and fine-tuning.

In the case of those big 'foundation models': Fine-tune for whom and how? I doubt it is possible to fine-tune things like this in a way that satisfies all audiences and training set instances. Much of this is probably due to the training set itself containing a lot of propaganda (advertising) or just bad style.

  • I'm pretty sure Mistral is doing fine tuning for their enterprise clients. OpenAI and Anthropic are probably not?

    I'm more thinking about startups for fine-tuning.

That's not quite how that works though. It can for example be possible that fine-tuning a model to avoid the styles described in the article cause the LLM to stop functionaing as well as it can. It might just be an artefact of the architecture itself that to be effective it has to follow these rules. If it was as easy as just providing data and the LLM would then 'encode' that as a rule, we would advance much quicker than we currently are.