Comment by SequoiaHope
2 days ago
What you describe is a person selecting the best results, but if you can get better results one shot with that option enabled, it’s worth testing and reporting results.
2 days ago
What you describe is a person selecting the best results, but if you can get better results one shot with that option enabled, it’s worth testing and reporting results.
I get that. But then if that option doesn't help, what I've seen is that the next followup is inevitably "have you tried doing/prompting x instead of y"
It can be summarized as "Did you RTFM?". One shouldn't expect optimal results if the time and effort wasn't invested in learning the tool, any tool. LLMs are no different. GPT-5 isn't one model, it's 6: gpt-5, gpt-5 mini, gpt-nano. Each takes high|medium|low configurations. Anyone who is serious about measuring model capability would go for the best configuration, especially in medicine.
I skimmed through the paper and I didnt see any mention of what parameters they used other than they use gpt-5 via the API.
What was the reasoning_effort? verbosity? temperature?
These things matter.
Something I've experienced with multiple new model releases is plugging them into my app makes my app worse. Then I do a bunch of work on prompts and now my app is better than ever. And it's not like the prompts are just better and make the old model work better too - usually the new prompts make the old model worse or there isn't any change.
So it makes sense to me that you should try until you get the results you want (or fail to do so). And it makes sense to ask people what they've tried. I haven't done the work yet to try this for gpt5 and am not that optimistic, but it is possible it will turn out this way again.
> I get that. But then if that option doesn't help, what I've seen is that the next followup is inevitably "have you tried doing/prompting x instead of y"
Maybe I’m misunderstanding, but it sounds like you’re framing a completely normal proces (try, fail, adjust) as if it’s unreasonable?
In reality, when something doesn’t work, it would seem to me that the obvious next step is to adapt and try again. This does not seem like a radical approach but instead seems to largely be how problem solving sort of works?
For example, when I was a kid trying to push start my motorcycle, it wouldn’t fire no matter what I did. Someone suggested a simple tweak, try a different gear. I did, and instantly the bike roared to life. What I was doing wasn’t wrong, it just needed a slight adjustment to get the result I was after.
I get trying and improving until you get it right. But I just can't make the bridge in my head around
1. this is magic and will one-shot your questions 2. but if it goes wrong, keep trying until it works
Plus, knowing it's all probabilistic, how do you know, without knowing ahead of time already, that the result is correct? Is that not the classic halting problem?
2 replies →