← Back to context

Comment by liteclient

12 hours ago

it makes sense architecturally

they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute

that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far

Yup, keyword here is “under the right conditions”.

This may work well for their use case but fail horribly in others without further peer review and testing.