Comment by bluehark

2 days ago

How large was the dataset used for post-training?

4 comments

bluehark

We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.

lawlessone 2 days ago
How is the data collected?
- sangwulee 2 days ago
  
  The highest quality finetuning data was hand curated internally. I would say our post training pipeline is quite similar to SeedDream 2.0 ~ 3.0 series from ByteDance. Similar to them, we use extensive quality filters and internal models to get the highest quality possible. Even from there, we still hand curate a hand-picked subset.
  
  1 reply →