Comment by vidarh

5 days ago

You'd think so. It seems like there are a lot of odd gaps like that.

I also have a favourite English language PhD thesis I ask every new model about that they still struggle to find even though there's a Wikipedia article about it that links a blog post I wrote about it.

Anyone who thinks they've exhausted even publicly crawlable resources should ask them about some obscure stuff.

4 comments

vidarh

mistrial9 5 days ago

you might be surprised if you take this approach.. give key words and phrases in small amounts, each sentence of a prompt building on a previous sentence. Take a an example that is not very hard, like Lewis Carrol Alice in Wonderland original text. Although a quick question might get things sort of wrong, or miss details, if you guide the LLM to a certain part of the story, then a certain set of characters in that part of the story, then a certain statement or dramatic moment with those characters in that part of the story, you might get very specific detail that is close to line-by-line accurate. On the other hand, if you ask a quick, ordinary question about the same part of the story without supplying context and character names, you get something equally vague. YMMV

vidarh 4 days ago

For the PhD thesis in question, I've actually tested a lot of requests about different parts of it, and both Claude and ChatGPT still draws a total blank if you don't let them do searches.

thatcat 5 days ago

the models don't retain their full training data set

vidarh 4 days ago

No, but they do retain enough that it is interesting what they fail to retain.