Comment by blovescoffee

8 hours ago

Companies like Mercor sell data from human experts

2 comments

blovescoffee

Offhand, do you know what format that data is in? Is it a question and then a human answering that question? Mostly just curious at to what the training data consists of.

jmalicki 8 hours ago

The most advanced training data is in the form of rubrics as rewards.
A human asks a question, then writes rubrics to judge the LLMs response, so rather than evaluating a specific response, those rubrics can live on as the LLM evolves and gives different answers. There are more complex variants as well, but that's the basic principle.
https://arxiv.org/abs/2507.17746