← Back to context

Comment by bpodgursky

7 months ago

This is a wild misunderstanding of LLMs. Data labeling has nothing to do with generating the astronomical text corpus used to train modern LLMs.

2 comments

bpodgursky

Reply

heavyset_go 7 months ago

The HF part of RLHF to refine the output of LLMs also happens in these places

astrange 7 months ago

Note RLHF can only perform selection on existing model outputs, adding new data is SFT or else just more pretraining.
ChatGPT speaking African English was mostly just 3.5. 4o speaks like a TikTok user from LA. 5 seems kind of generic.