Comment by GhostDrift
3 days ago
Background Softmax attention blends all values weighted by normalized scores. While effective, it inherently diffuses signal — sometimes undesirable in tasks requiring discrete focus.
GD-Attention, derived from the Ghost Drift Theory, takes a different approach: it selects a single coherence point ∗ along a "jump" direction in the semantic space, determined by an underlying energy function.
The attached paper formalizes this mathematically, and the demos let you see and interact with these mechanisms in real time:
Part 1: Energy landscape → find ∗ and Part 2: Compare Softmax vs GD-Attention outputs and selectivity
This is still experimental, and feedback from the community — especially on edge cases, real-world applicability, and potential integration with transformer architectures — would be hugely appreciated.
No comments yet
Contribute on Hacker News ↗