Comment by sillysaurusx

4 years ago

I'm not sure. I thought it was nothing short of a miracle that it could be done at all. I tried, hard, in Tensorflow, to make it work. But there was also no way to communicate directly from TPU to TPU; I had to serialize to GCE buckets as a middle step, which added massive complexity to the code.

The ray solution here is so simple. It was a joy to use. But I don't know anything about Dask.

By the way, if anyone wants to see how the loss graphs turned out: https://twitter.com/theshawwn/status/1406171487988498433

(I wish I'd uploaded the logs to tensorboard.dev for posterity. But you can see quite clearly in the screenshots all the information you'd want to see anyway, with apologies to blind engineers. Oh lord, is there a single blind engineer working in ML? Suddenly it's an appealing idea to try to make tensorboard accessible... I wonder how blind people could interpret graphs. Averages, probably.)