Comment by m101
1 year ago
Perhaps the smaller model used in o1 is over trained on arxiv and code relative to 4o (or undertrained on legal text)
1 year ago
Perhaps the smaller model used in o1 is over trained on arxiv and code relative to 4o (or undertrained on legal text)
No comments yet
Contribute on Hacker News ↗