Comment by maxbond

14 hours ago

I don't think there was very much abuse of "not just A, but B" before ChatGPT. I think that's more of a product of RLHF than the initial training. Very few people wrote with the incredibly overwrought and flowery style of AI, and the English speaking Internet where most of the (English language) training data was sourced from is largely casual, everyday language. I imagine other language communities on the Internet are similar but I wouldn't know.

Don't we all remember 5 years ago? Did you regularly encounter people who write like every followup question was absolutely brilliant and every document was life changing?

I think about why's (poignant) Guide to Ruby [1], a book explicitly about how learning to program is a beautiful experience. And the language is still pedestrian compared to the language in this book. Because most people find writing like that saccharin, and so don't write that way. Even when they're writing poetically.

Regardless, some people born in England can speak French with a French accent. If someone speaks French to you with a French accent, where are you going to guess they were born?

[1] https://poignant.guide/book/chapter-1.html

2 comments

maxbond

PaulRobinson 13 hours ago

It's been alleged that a major source of training data for many LLMs was libgen and SciHub - hardly casual.

maxbond 13 hours ago

Even if that were comparable in size to the conversational Internet, how many novels and academic papers have you read that used multiple "not just A, but B" constructions in a single chapter/paper (that were not written by/about AI)?