Comment by gjm11

1 year ago

There's another, arguably even simpler, optimization that makes me smile. (Because it's silly and arises only from the oddity of the task, and because it's such a huge performance gain.)

You're picking 1,000,000 random numbers from 1 to 100,000. That means that any given number is much more likely to appear than not. In particular, it is very likely that the list contains both 3999 (which is the smallest number with digit-sum 30) and 99930 (which is the largest number in the range with digit-sum 30).

Timings on my machine:

Naive implementation (mod+div for digit-sums): 1.6s. Computing digit-sum only when out of range: 0.12s. Checking for the usual case first: 0.0004s.

The probability that the usual-case check doesn't succeed is about 10^-4, so it doesn't make that big a difference to the timings whether in that case we do the "naive" thing or the smarter thing or some super-optimized other thing.

I'm confused about the absolute timings. OP reports 0.66s for naive code using str/int to compute the digit sums; I get about 0.86s, which seems reasonable. For me using mod+div is about 2x slower, which isn't a huge surprise because it involves explicit looping in Python code. But you report 55ms for this case. Your machine can't possibly be 20x faster than mine. Is it possible that you're taking 10^5 numbers up to 10^6 rather than 10^6 numbers up to 10^5? Obviously in that case my hack would be completely useless.)

66 comments

gjm11

ActivePattern 1 year ago

This is actually a great example of an optimization that would be extremely difficult for an LLM to find. It requires a separate computation to find the smallest /largest numbers in the range with digits summing to 30. Hence, an LLM is unlikely to be able to generate them accurately on-the-fly.

cultureulterior 1 year ago
O1 found it.
https://chatgpt.com/share/67782b6b-6248-8012-882d-238b600ef9...
- bennythomsson 1 year ago
  
  Amazing.
  Next step would be to propose hardcoding 99930-3999 as the O(1) result and live with the output just being wrong sometimes. The bug rate is then in the ballpark of most modern software, including LLMs', so I'd say ship it.
  
  7 replies →
- blueaquilae 1 year ago
  
  Did it found it before the HN comment? O1 has access to the web so I'm just asking
  
  3 replies →
- drysine 1 year ago
  
  Sadly it failed to make a separate loop to check for this first.
- twobitshifter 1 year ago
  
  Should we be worried yet?
- fn-mote 1 year ago
  
  This isn’t amazing because it’s a well known contest trick.
batterseapower 1 year ago
I tried it in OpenAI's O1. If I give it minimaxir's original prompt it writes the obvious loop, even if I include the postamble "Look for tricks that will make this function run as fast as possible in the common case".
However, if I then simply ask "What is the most probable result for this function to return?" it figures out the answer and a very good approximation of the probability (4.5e-5). From there it's easily able to rewrite the program to use the trick. So the creative step of spotting that this line of reasoning might be profitable seems missing for now, but 2025's models might solve this :-)
- Vetch 1 year ago
  
  The information on the creative step which you provided to o1, was also the key step and contained almost all the difficulty. The hope is that 2025 models could eventually come up with solutions like this given enough time, but this is also a toy problem. The question is how much clever answers will cost for real world complex problems. At present it looks like, very much.
  
  9 replies →
- pinko 1 year ago
  
  This gets to the old saw, "knowing what question to ask is the most important thing". To the extent that LLMs can answer questions better than formulate which ones to ask, they may be inherently limited. We will see.
  
  2 replies →
Vetch 1 year ago

Excellent point. The hope is reasoning LLMs will make a difference for such problems. But it's also a great example of why the those who think being able to have the LLM iterate more will be crucial to reasoning are off base. There are many computations that a transformers (or humans for that matter) are not well equipped to represent internally, tool use during the reasoning process is unavoidable for all but the artificial or knowledge heavy problems.
Small examples, throwaway but involved calculations, prototypes, notes of what didn't work and what's promising are what's crucial for novel reasoning. It goes beyond just search or iterative refinement; there is no royal road to reasoning.
asah 1 year ago
You guys are picking on the problem statement. Here's a revised prompt, which also skips the silliness of single threading:
Write __fully parallelized__ Python code to solve this problem: __Generate__ 1 million random integers between 1 and 10,000,000, find the difference between the smallest and the largest numbers whose digits sum up to 30.
- ActivePattern 1 year ago
  
  Correct, this optimization no longer works when you change the problem.
  
  1 reply →
- _wire_ 1 year ago
  
  Whose digits sum up to 30, or the sum of whose digits equal 30?
  Btw, _whose_ digits are we talking about?
  —
  I just built a random program generator. After I finish optimizing, I'm gonna test it to see if works!
  —
  "If builders built houses the way programmers build programs, the first woodpecker to come along would destroy civilization"
  https://en.m.wikiquote.org/wiki/Gerald_Weinberg
  
  3 replies →
- dgacmu 1 year ago
  
  But what's interesting about this is that there's a tradeoff in the total computation performed by the "fully parallelized" version of this and a sequential one. Without the user knowing this, it's kind of impossible to get the optimization you want: Do you want a minimum work solution or a minimum wall-clock-time solution?
  If you want a better fully parallelized one, you do this:
  Repeat a few times in exponential progression on k:
  Process, in parallel, the first k entries in the list (let's start with 1000). Find the min and max whose digit sums = 30.
  In parallel, filter the remaining list to eliminate entries that would not improve upon the min/max thus found.
  k *= 10 and repeat until done.
  I would wager against the LLM identifying this solution without prompting from the user (or reading this comment).
Octoth0rpe 1 year ago
> This is actually a great example of an optimization that would be extremely difficult for an LLM to find
It'll be somewhat more likely since the next gen training set includes your comment :)
(disclaimer: I have no personal knowledge of ai companies scraping hacker news, but it wouldn't surprise me at all)
- lblume 1 year ago
  
  It would be very surprising if they would not scrape this site. The content is very high-quality in the general case and there are no giant barriers preventing entry (there even is a clean API!). One might even use us to fine-tune a coding assistant or the alike.
qup 1 year ago
Are you sure it would be hard?
Maybe it only requires asking the LLM to be creative when designing the algorithm. The parent poster spent some time thinking about it, obviously--he didn't generate it accurately "on the fly," either. But he's able to direct his own attention.
I don't see why the LLM couldn't come up with this logic, if prompted to think about a clever algorithm that was highly specific to this problem.
- ActivePattern 1 year ago
  
  I suspect that it would be unlikely to come up with it because it requires execution of a fairly lengthy algorithm (or sophisticated mathematical reasoning) to find the smallest/largest valid numbers in the range. You can verify this for yourself with the following ChatGPT prompt: "What is the smallest number in the range (1, 100000) whose digits sum to 30? Do not execute separate code."
  
  7 replies →

afiodorov 1 year ago

This gave me an idea that we can skip the whole pass over the million draws by noting that the count of draws landing in my precomputed set M (digits-sum=30) is Binomial(n=1mln, p=|M|/100k). Then we sample that count X. If X=0, the difference is not defined. Otherwise, we can directly draw (min,max) from the correct joint distribution of indices (like you’d get if you actually did X draws in M). Finally we return M[max] - M[min]. It’s O(1) at runtime (ignoring the offline step of listing all numbers whose digits sum to 30).

GuB-42 1 year ago

In fact, we could simply check for the 3 smallest and the 3 highest numbers and ignore the rest.

Assuming the numbers are really random, that's a probability of 10^-13. That probability is at the point where we are starting to think about errors caused by cosmic rays. With a bit more numbers, you can get to the point where the only way it can fail is if there is a problem with the random number generation or an external factor.

If it was something like a programming contest, I would just do "return 95931" and hope for the best. But of course, programming contests usually don't just rely on random numbers and test edge cases.

sweezyjeezy 1 year ago

for 10^5, to get the same collision probability (~2 * exp(-10)), you would just need to compute the 10 maximum/minimum candidates and check against those.

ryao 1 year ago

With this trick you can test while generating the random numbers and if you see both values, you can short circuit the generation of random numbers.

bennythomsson 1 year ago
The input generation is outside the scope of this. Otherwise you could directly choose the output values with the apropriate distribution and just skip all the rest.
(Arguably, this criticism applies to exchanging random.randint for a numpy equivalent as well, since that doesn't optimize the solution but only how quickly the question is being generated.)
- ryao 1 year ago
  
  Iterating a precomputed list is a method of generating random numbers. It is used in the one time pad. Whether we iterate a precomputed list or use a pseudo random number generator, we can short circuit the random number generator using this trick. We cannot directly choose the output values, because then it would not be random.
  
  4 replies →

dgacmu 1 year ago

No, you're right, I should have said 550ms and 100ms, I'm having a doof morning about timing. Thank you! Too late to edit my post.

econ 1 year ago

Ahhhh so we can just return the result without doing anything!