← Back to context

Comment by nickwalton00

7 years ago

Had to go check my training file to remember.

Datasize: Around 30 MB, so around ~8000000 token? Can't remember exactly Learning Rate: was 1e-4, so I guess not that slow. I trained for around 1000 steps, but ended up liking the model from step 550. Which I think ended up at around 2 full passes through my data.

There probably is a point where increasing batch size is no longer helpful, my batch size was 32. When I had it lower I had issues with memorization/bias towards particular parts of the training data that it had most recently trained on.

Thanks, good to have this data point. I’ve been training a roughly similarly sized dataset for many 10s of ks of steps (but on 355m). Wondering if I need so many steps.

Only 30MB? If it's based on text adventures, can't you get way more data than that?

  • I scraped a bunch of stories from chooseyourstory.com but I did curate them to make sure they had the right second person format. I couldn't really anywhere else that had a consistent format that would make scraping easy enough.