Comment by itissid

2 years ago

One important thing to not do here at any cost is to make a comparison between GPTs and KNNs and go "GPTs or BERT are meh like Gzip".

Remember that GPTs, for all their flaws[1], are learning the joint distribution P(X,Y) which can then generate text(or image/audio) because knowing the joint is what allows generation and prediction. But certain prediction tasks alone can be done by discriminative models that learn P(Y|X) well but these models generally have no clue(or much less of a clue) what human text actually is.

[1] which is they are in the end highly efficient and generalized stochastic 16k* NGram model on steroids which is a of the human language on the web with RHLF based lobotomy of the conditional distribution of the NGram model to say "non-offensive" things.