Comment by gwern
6 years ago
AI is already better (just not evenly distributed). The small GPT-2s are bad at math, but they're not trained for that in the first place; we know Transformers are capable of doing excellent things with math because they do in other papers which tackle more specialized problems like theorem proving. The shallowness of the GPT-2s is definitely part of it (it gets only a few sequential steps of computation to 'think'), as is lousy sampling procedures, and just a general of parameters: 'coherency' in general seems to improve drastically as you scale up to Megatron levels. If you combined all of the SOTA pieces and polished it for a while and plugged it into social media for RL, you'd get something much better than this...
No comments yet
Contribute on Hacker News ↗