Comment by simianwords
16 hours ago
Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.
16 hours ago
Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.
Update: I took a corpus of personal chat data (this way it wouldn't be seen in training), and tried asking it some paraphrased questions. It performed quite poorly.
Which models did you try?
Claude Sonnet 4.6
Oh, interesting!