← Back to context

Comment by jonnycomputer

19 days ago

how do you determine that they know the content of the honeypot?

Presumably the "honeypot" is an obscured link that humans won't click (e.g. tiny white text on a white background in a forgotten corner of the page) but scrapers will. Then you can determine whether a given IP visited the link.

  • I know what a honeypot is, but the question is how the know the scraped data was actually used to train llms. I wondered whether they discovered or verified that by getting the llm to regurgitate content from the honeypot.

  • I interpreted it to mean that a hidden page (linked as u describe) is indexed in Bing or that some "facts" written on a hidden page are regurgitated by ChatGPT.