Comment by Art9681
3 days ago
It can be summarized as "Did you RTFM?". One shouldn't expect optimal results if the time and effort wasn't invested in learning the tool, any tool. LLMs are no different. GPT-5 isn't one model, it's 6: gpt-5, gpt-5 mini, gpt-nano. Each takes high|medium|low configurations. Anyone who is serious about measuring model capability would go for the best configuration, especially in medicine.
I skimmed through the paper and I didnt see any mention of what parameters they used other than they use gpt-5 via the API.
What was the reasoning_effort? verbosity? temperature?
These things matter.
No comments yet
Contribute on Hacker News ↗