Comment by stanfordkid
1 month ago
Just look at the domains. Obviously social media will get harder to do this with, maybe that's okay though. I think a simple criterion can be used: could the pre-trained LLM have come up with this itself? If so it probably doesn't have training value.
No comments yet
Contribute on Hacker News ↗