Comment by ActivePattern

1 year ago

I suspect that it would be unlikely to come up with it because it requires execution of a fairly lengthy algorithm (or sophisticated mathematical reasoning) to find the smallest/largest valid numbers in the range. You can verify this for yourself with the following ChatGPT prompt: "What is the smallest number in the range (1, 100000) whose digits sum to 30? Do not execute separate code."

7 comments

ActivePattern

qup 1 year ago

O1 did find the optimization in a sibling comment (sibling to my GP)

So probably time to update your expectations

sitkack 1 year ago

Why limit its ability to write separate code?

ActivePattern 1 year ago
Because otherwise we are talking about LLMs augmented with external tools (i.e. Python interpreters). My original comment was pointing to the limitations of LLMs in writing code by themselves.
- sitkack 1 year ago
  
  You wouldn't ask a programmer to solve a problem and then also not let them write down the source or debug the program as you write it?
  Are you asking it to not write down an algorithm that is general? They are doing a pretty good job on mathematical proofs.
  I still don't understand why you wouldn't let its full reasoning abilities by letting it write down code or even another agent. We should be testing towards the result not the methods.
  
  2 replies →
- tags2k 1 year ago
  
  But programmers are LLMs augmented with the ability to run code. It seems odd to add a restriction when testing if an LLM is "as good as" a programmer, because if the LLM knows what it would need to do with the external code, that's just as good.