← Back to context

Comment by deepsquirrelnet

3 months ago

This is pretty cool. I have a similar model that’s 8 days into training on msmarco.

So far I only have the “cold start” data posted, but I’m planning on posting a full distillation dataset.

https://huggingface.co/datasets/dleemiller/lm25

2 comments

deepsquirrelnet

Reply

jacobgorm 3 months ago

What kind of hardware setup would be needed to replicate the paper’s results?

deepsquirrelnet 3 months ago

I am training phi-4 (14B) using a single A6000. There’s some tricks you have to use to keep VRAM consumption down - mainly LoRA and quantization.
There’s a package called “unsloth” that integrates with huggingface’s TRL library that can help.