Comment by sbassi 8 hours ago Which data uses for training? 2 comments sbassi Reply simonw 7 hours ago karpathy/fineweb-edu-100b-shuffle: https://huggingface.co/datasets/karpathy/fineweb-edu-100b-sh...Which is derived from HuggingFaceFW/fineweb-edu: https://huggingface.co/datasets/HuggingFaceFW/fineweb-eduHuggingFaceTB/smol-smoltalk: https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalkAnd extra fine-tuning on portions of:cais/mmlu: https://huggingface.co/datasets/cais/mmluopenai/gsm8k: https://huggingface.co/datasets/openai/gsm8kallenai/ai2_arc: https://huggingface.co/datasets/allenai/ai2_arc eranation 8 hours ago I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)
simonw 7 hours ago karpathy/fineweb-edu-100b-shuffle: https://huggingface.co/datasets/karpathy/fineweb-edu-100b-sh...Which is derived from HuggingFaceFW/fineweb-edu: https://huggingface.co/datasets/HuggingFaceFW/fineweb-eduHuggingFaceTB/smol-smoltalk: https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalkAnd extra fine-tuning on portions of:cais/mmlu: https://huggingface.co/datasets/cais/mmluopenai/gsm8k: https://huggingface.co/datasets/openai/gsm8kallenai/ai2_arc: https://huggingface.co/datasets/allenai/ai2_arc
eranation 8 hours ago I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)
karpathy/fineweb-edu-100b-shuffle: https://huggingface.co/datasets/karpathy/fineweb-edu-100b-sh...
Which is derived from HuggingFaceFW/fineweb-edu: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu
HuggingFaceTB/smol-smoltalk: https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk
And extra fine-tuning on portions of:
cais/mmlu: https://huggingface.co/datasets/cais/mmlu
openai/gsm8k: https://huggingface.co/datasets/openai/gsm8k
allenai/ai2_arc: https://huggingface.co/datasets/allenai/ai2_arc
I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)