Comment by roywiggins

3 days ago

imho the prose is very ChatGPT unfortunately

> Residual connections are more than a trick to help gradients flow. They’re a conservation law.

> Not a hack, not a trick. A principled constraint that makes the architecture work at scale.

  • OK, I thought I was reading too much into it but those same sentences also jumped out for me

    • pangram thinks the whole thing was LLM generated fwiw, as dodgy as AI detectors are it is probably among the best. I don't doubt the author started with their own text, but I think it's been substantially revised via ChatGPT