← Back to context

Comment by wtracy

15 years ago

Many of the posts here and on 4chan have deconstructed the algorithm and proven that it is actually O(n^2) or O(n * lg n) when you include the work done by the operating system's scheduler.

However, here's a different perspective to look at this from: What if this were implemented on an FPGA?

Let's ignore for a moment that the number of gates needed would increase linearly with the size of the input. I'll also simplify the math by assuming that a list of n numbers contains only values in the range [0 .. n] .

Let's further assume that we're only sorting positive integers, and that each job sleeps for x clock cycles (where x in the input number). We could sort a list in O(n) time.

Digging even deeper: Quicksort performs an integer comparison followed by a branch O(n lg n) times. Even if your processor can compare two integers in one clock cycle, a branch is probably going to screw up your processor's pipelining and take quite a few clock cycles to execute. So, we're possibly looking at a significant speed increase even before we consider the difference in order of growth.

So, assuming a predefined upper bound on the number of items in a list, this just might be a great way to perform hardware-assisted sorts.

Thoughts?

For hardware based sorting, we can do even better using sorting networks[0]. Sorting networks are comprised of a system of wires, one for each input, and comparators between wires which always output the higher number on the bottom wire and the lower on the top. Using this framework we can do massively parallel sorts with excellent time complexity.

The problem is that sorting networks need to be designed for each input size, and designing efficient sorting networks is an open problem. There has been some work in the evolutionary computing community on sorting networks, and some of the best known networks for various sizes were found by EAs, but large networks are still a challenge.

[0] http://en.wikipedia.org/wiki/Sorting_network

I just realized: We would need a circuit polling each of the jobs to see when they finish, and I don't see how it could poll all of jobs every clock cycle. So I don't see it being possible to achieve the per-cycle resolution I suggested above.

Furthermore, the polling circuit would have to poll each of the n tasks n times, leading us finally back to a running time of O(n^2).

Still a fun thought experiment, though.

  • Why would you poll the tasks? Cant the tasks wake up after their time is done, just fire an interrupt witht he number of the task.