Comment by GTP
8 days ago
So, if I understand correctly, this is about finding the optimal (or at least a better one) GPT architecture?
Anyway, "1980 experiments, 6 improvements" makes me wonder if this is better than a random search or some simple heuristic.
No comments yet
Contribute on Hacker News ↗