Comment by AlexCoventry
6 hours ago
I have a transformer attention mechanism which seems to be more data-efficient than the usual dot product, and I'm trying to write a performant backwards kernel for it.
6 hours ago
I have a transformer attention mechanism which seems to be more data-efficient than the usual dot product, and I'm trying to write a performant backwards kernel for it.
No comments yet
Contribute on Hacker News ↗