Comment by macleginn
3 months ago
There has been some experimentation with the use of ReLU^2 in language models in recent years, e.g., here: https://proceedings.neurips.cc/paper_files/paper/2021/file/2...
3 months ago
There has been some experimentation with the use of ReLU^2 in language models in recent years, e.g., here: https://proceedings.neurips.cc/paper_files/paper/2021/file/2...
No comments yet
Contribute on Hacker News ↗