Comment by icepush
16 hours ago
Did you ask the question several times in fresh chat contexts to see if it sometimes gives the right answer ?
16 hours ago
Did you ask the question several times in fresh chat contexts to see if it sometimes gives the right answer ?
Nah, n=1 is enough to give evidence that something is entirely broken, of course.
/s
Well, when we had deterministic tools, it would only take a single example of a calculator claiming 1+1=4 for me to throw it in the trash.
And if you can come up with a deterministic tool that can do everything LLMs can then that would be amazing! Until then, we have to accept the non-determinism.