Comment by bwfan123

7 months ago

Problem 6 is puzzling. Neither openai nor deepmind answered it. Humans would put out partial answers - but here we saw no answer which is odd.

Does that mean that the llms realized they could not solve it. I thought that was one of the limitations of LLMs in that they dont know what they dont know, and it is really impossible without a solver to know the consistency of an argument, ie, know that one knows.

2 comments

bwfan123

nightpool 7 months ago

I think it probably just means that they exhausted the competition time limit without completing the "thinking" portion and getting to the "output" stage.

killerstorm 7 months ago

That applies only to most basic use of LLM: pre-trained LLM generating text.

You can do a lot of things on top: e.g. train a linear probe to give a confidence score. Yes, it won't be 100% reliable, but it might be reliable if you constraint it to a domain like math.