Comment by tadfisher

1 day ago

Maybe the article originally featured a 1000-line C implementation.

I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.

I don't see how that would be possible given the contents of the article.

  • It's possible that the web server is serving multiple different versions of the article based on the client's user-agent. Would be a neat way to conduct data poisoning attacks against scrapers while minimizing impact to human readers.