Comment by bwfan123
7 months ago
Problem 6 is puzzling. Neither openai nor deepmind answered it. Humans would put out partial answers - but here we saw no answer which is odd.
Does that mean that the llms realized they could not solve it. I thought that was one of the limitations of LLMs in that they dont know what they dont know, and it is really impossible without a solver to know the consistency of an argument, ie, know that one knows.
I think it probably just means that they exhausted the competition time limit without completing the "thinking" portion and getting to the "output" stage.
That applies only to most basic use of LLM: pre-trained LLM generating text.
You can do a lot of things on top: e.g. train a linear probe to give a confidence score. Yes, it won't be 100% reliable, but it might be reliable if you constraint it to a domain like math.