Comment by arrowsmith
19 days ago
Presumably the "honeypot" is an obscured link that humans won't click (e.g. tiny white text on a white background in a forgotten corner of the page) but scrapers will. Then you can determine whether a given IP visited the link.
I know what a honeypot is, but the question is how the know the scraped data was actually used to train llms. I wondered whether they discovered or verified that by getting the llm to regurgitate content from the honeypot.
I interpreted it to mean that a hidden page (linked as u describe) is indexed in Bing or that some "facts" written on a hidden page are regurgitated by ChatGPT.