Comment by gwern

2 years ago

> A fourth option would be to simply scale the inversion model up by 10x or 100x, which would give us predictably better performance. We didn't try this because it's just not that interesting.

Scaling is always both important and interesting.

To train a larger inversion model, I think we'll just need 16 A100s for a month or two. We can circle back in December, once the bigger model finishes training, to see that we've gotten better reconstruction performance. Fascinating!