Comment by lukeschlather
11 hours ago
Using a mini model for this seems grossly irresponsible. I've been doing some work testing models for similar extraction tasks (nothing where a failure affects someone's grade or anything) and gpt mini / Gemini flash simply can't do this sort of thing. Using anything less than the highest model with reasoning, you're guaranteed to get this sort of thing happening.
It is very tempting to do it, obviously, with the cost difference, but it's not worth it. But on the other hand, people talk about LLMs with a broad brush and I don't know, there's still testing but I would be surprised to hear that GPT-5-pro with thinking had an issue like this.
No comments yet
Contribute on Hacker News ↗