Comment by niyue
5 years ago
I don't know how to define "generated tweets".
I am 100% sure that some of the tweets I got back should be considered as "found"/"searched" instead of "generated". For example, I tried "bigdata", and one of the "generated" tweet is "Big data is like teenage sex: everyone is talking about it, but nobody really knows how to do it." and I believe this is not AI generated and is simply a copy of other human being's tweet.
Indeed, that seems to be from at least as early as 2013: https://news.ycombinator.com/item?id=23887405
It's funny, ever since the most recent thread about low background steel the other day it seems to be popping up with some frequency. I was aware of it before, so I'm not sure this is just a case of Baader-Meinhof (when you learn of something and then "start" to hear about it all the time, but really you're just now paying attention to it)
Edit: actually, I think I saw this tweet: https://twitter.com/rantlab/status/1284849214653034497, and remember thinking "I bet this person just read the hn thread about low backround steel". It doesn't seem to have come up on hn other than that since the low background steel post.
It's a really fun feeling when you notice that someone posted something because they read the same thing you read and it sparked a similar association.
There are probably already manipulation techniques in play where an actor 'plants' (or 'incept' if you will) a future post by posting a lot of things that will lead people to organically 'find' the subject they want to promote.
2 replies →
In my case I definitely only included it as an example due to the recent submission on HN, so not Baader-Meinhof in this case. I'm still surprised that @rantlab and myself came up with the exact same analogy independently though.
Clearly it wasn't as creative of an analogy as I originally thought. For myself, just a very small leap of logic after reading 'jobigoud's comment. Very surprised to see other people making it as well, and a good re-calibration for me!
2 replies →
There are so many discourse fads, especially on HN, like the time there was a lot of logical fallacy taxonomy. The echo chamber effect is very real. Even the use of echo chamber is and echo chamber. I guess its unavoidable that memes are a very real method we use for group cognition. I also noticed the background steel thing came up a lot recently.
It'd be interesting to compare plagiarism detector scores of average outputs of various generations of GPT.
Memorizing a web corpus and answering questions about it as well as Google would be an interesting result.
Well, it's not close to that, but it's close enough to be amusing. Here are some questions it answered in Q&A:
https://tildes.net/~games/qmc/ai_dungeon_dragon_model_upgrad...
But I think for factual knowledge, it will need to be better about explaining where it got the information.
From 2013: https://mobile.twitter.com/danariely/status/2879522579269713...
Seems like a case of overfitting.
Considering how unoriginal we humans can be, I'm not at all surprised that it might generate already-existing tweets.
A Digital Single Market Directive article 17, section 6 (similarity detector) filter might be useful for that purpose.
Yep. The GPT-3 model is so big it has overfit a lot of the corpus.
Is it a straight copy, or just a case of the surprising effectiveness of stock phrases considered harmful?
A straight copy: https://twitter.com/cobbo3/status/1026527261397344256
Well clearly it's not a straight copy of that.
2 replies →