← Back to context

Comment by amalcon

2 hours ago

I know, right? It used to be easy: just look for writing in a very long-winded style, almost as if the author is being paid per word, in a place where that sort of writing didn't belong. I think it was because that type of writing represented a disproportionate fraction of the tokens in the training data due to the long-winded-ness. Somewhere around a year ago, they figured out some way to deal with that problem.