Comment by cowartc

8 hours ago

Interesting direction. One question: How does this hold up outside the synthetic transformer on a real downstream task? Reconstruction error is the right measure but its one step removed from the end task. I'm curious whether HAE would show a similar gap on a downstream benchmark.