Comment by nicksrose7224
1 year ago
disagree - i actually think all the problems the author lays out about Deep Research apply just as well to GPT4o / o3-mini-whatever. These things just are absolutely terrible at precision & recall of information
1 year ago
disagree - i actually think all the problems the author lays out about Deep Research apply just as well to GPT4o / o3-mini-whatever. These things just are absolutely terrible at precision & recall of information
I think Deep Research shows that these things can be very good at precision and recall of information if you give them access to the right tools... but that's not enough, because of source quality. A model that has great precision and recall but uses flawed reports from Statista and Statcounter is still going to give you bad information.
Deep Research doesn’t give the numbers that are in statcounter and statista. It’s choosing the wrong sources, but it’s also failing to represent them accurately.
Wow, that's really surprising. My experience with much simpler RAG workflows is that once you stick a number in the context the LLMs can reliably parrot that number back out again later on.
Presumably Deep Research has a bunch of weird multi-LLM-agent things going on, maybe there's something about their architecture that makes it more likely for mistakes like that to creep in?
4 replies →