Comment by simianwords

16 hours ago

Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.

4 comments

simianwords

stopachka 13 hours ago

Update: I took a corpus of personal chat data (this way it wouldn't be seen in training), and tried asking it some paraphrased questions. It performed quite poorly.

abraxas 13 hours ago
Which models did you try?
- stopachka 9 hours ago
  
  Claude Sonnet 4.6

stopachka 15 hours ago

Oh, interesting!