Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by sbassi

5 hours ago

Which data uses for training?

2 comments

sbassi

Reply

simonw  5 hours ago

karpathy/fineweb-edu-100b-shuffle: https://huggingface.co/datasets/karpathy/fineweb-edu-100b-sh...

Which is derived from HuggingFaceFW/fineweb-edu: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu

HuggingFaceTB/smol-smoltalk: https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk

And extra fine-tuning on portions of:

cais/mmlu: https://huggingface.co/datasets/cais/mmlu

openai/gsm8k: https://huggingface.co/datasets/openai/gsm8k

allenai/ai2_arc: https://huggingface.co/datasets/allenai/ai2_arc

eranation  5 hours ago

I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities