Comment by siweizzz
7 years ago
Can you share how large your dataset (how many tokens) and batch size was and how many epochs you used? By training slowly, do you mean that you used a small learning rate? If so, what was it?
I've been reading up on batch size and people are all over the place. Some say smaller is better and some say larger is better. Mostly when it comes to gpt2 people say larger is better but there must come a point when increasing the batch size is no longer beneficial (or is it just that you use as large as your memory will allow)?
In fact, it’s an open question whether larger batch sizes are better. https://twitter.com/jeremyphoward/status/1189643170377658369...
Seconding all of your questions! Details about successful 1.5B training is really hard to come by.
In case it’s helpful, here are some details of how a Chinese 1.5b GPT-2 was trained: https://github.com/imcaspar/gpt2-ml
It looks like they used a batch size of 2 on a TPUv3-256 pod. It took 50 hours and 99,000 training steps, which seems like about 1.3 examples per second.
Agreed there doesn’t seem to be a consensus. Thanks for the links
Had to go check my training file to remember.
Datasize: Around 30 MB, so around ~8000000 token? Can't remember exactly Learning Rate: was 1e-4, so I guess not that slow. I trained for around 1000 steps, but ended up liking the model from step 550. Which I think ended up at around 2 full passes through my data.
There probably is a point where increasing batch size is no longer helpful, my batch size was 32. When I had it lower I had issues with memorization/bias towards particular parts of the training data that it had most recently trained on.
Thanks, good to have this data point. I’ve been training a roughly similarly sized dataset for many 10s of ks of steps (but on 355m). Wondering if I need so many steps.
Only 30MB? If it's based on text adventures, can't you get way more data than that?
I scraped a bunch of stories from chooseyourstory.com but I did curate them to make sure they had the right second person format. I couldn't really anywhere else that had a consistent format that would make scraping easy enough.