Comment by D-Machine
6 hours ago
When you look at how monstrously large (and obviously not thought through at all, if you understand even the most minimal basics of the linear algebra and math of a transformer LLM) the components are that are ablated (weights set to zero) in his "Ablation Strategies" section, it is no surprise.
Strategy What it does Use case
.......................................................
layer_removal Zero out entire transformer layers
head_pruning Zero out individual attention heads
ffn_ablation Zero out feed-forward blocks
embedding_ablation Zero out embedding dimension ranges
https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...
No comments yet
Contribute on Hacker News ↗