← Back to context

Comment by GhostDrift

3 days ago

Background Softmax attention blends all values weighted by normalized scores. While effective, it inherently diffuses signal — sometimes undesirable in tasks requiring discrete focus.

GD-Attention, derived from the Ghost Drift Theory, takes a different approach: it selects a single coherence point ∗ along a "jump" direction in the semantic space, determined by an underlying energy function.

The attached paper formalizes this mathematically, and the demos let you see and interact with these mechanisms in real time:

Part 1: Energy landscape → find ∗ and Part 2: Compare Softmax vs GD-Attention outputs and selectivity

This is still experimental, and feedback from the community — especially on edge cases, real-world applicability, and potential integration with transformer architectures — would be hugely appreciated.