← Back to context Comment by gf000 9 days ago Not the parent poster, but I did get the wrong answer even with reasoning turned on. 1 comment gf000 Reply tezza 9 days ago Thank you all! We needed further data points.comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples.for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query.e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...
tezza 9 days ago Thank you all! We needed further data points.comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples.for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query.e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...
Thank you all! We needed further data points.
comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples.
for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query.
e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...