4chan discussion aside, the concept here is that you 'sleep' n units of time for each element, where n is linearly related to the lexical relationship of the element to the other elements in the list. The 'aha' was that the resulting threads will 'wake up' or 'return' in lexical order.
Basically if you transform set L into the time domain, when collecting it back you get it back sorted.
Its a fun result, and as an exercise in systems analysis it can be enlightening to look at the impact of timer resolution, thread creation, and ordering, but ultimately the 'trick' is that you've exploited the 'insertion sort' that the kernel does on sleep events. You could try its close cousin "priority sort" where you create threads where you set the priority of each to be related to the value 'n' of the element, and all fractionally lower than the parent thread (most UNIXes are not that flexible but some realtime OSes are) then as the last step you 'yield' and the threads print out their contents in priority order and poof they are sorted. But again, the sorting happened in the kernel when they got inserted into the runq in priority order.
I was impressed by the brevity of the C solution using OpenMP (comment #81), especially compared to significantly more verbose Erlang and Haskell examples. I kept reading it, thinking, "Where's the fork?" and seeing nothing but a simple for...and then remembered that # is a preprocessor directive, rather than a comment, in C, and then googled for "parallel omp for". So, I actually learned something today from that ridiculous thread: http://en.wikipedia.org/wiki/OpenMP
Also, the Perl 6 one-liner is sort of laughably cute . I'm not sure I believe it would actually work, but if it does, the conciseness of Perl 6 is even higher than I'd realized; dropping about 9 words from the Perl 5 one-liner (which is already very small). But, I learned about the hyper operators in Perl 6, which is also cool: http://perl6advent.wordpress.com/2009/12/06/day-6-going-into...
In short, that was awesome reading over my morning tea.
I'm relieved to note that Haskell and Erlang don't have to be ridiculously verbose. There's even an Erlang one-liner that is competitive with the Perl and Ruby implementations for brevity. I was amused to note that Perl is still the shortest implementation, and could be shorter (though the frighteningly short Perl 6 implementation mentioned in the 4chan thread is either wrong, or the implementation of the <<. operator is still unfinished in Rakudo Star, because it does not actually do the job; I don't understand the hyperoperators well enough yet to know which is true).
significantly more verbose Erlang and Haskell examples
There are less verbose (but also less well-formed) solutions in these languages. Here's a working, hacky but still readable, 3-line one I just banged out for Erlang: http://pastebin.com/eZckJp9p
... and a version that trades a line for calling an actual function named sleep(), given there are rules to this silly game http://pastebin.com/heQ43K0E ...
Also, the Perl 6 one-liner is sort of laughably cute . I'm not sure I believe it would actually work
I don't think any current implementation supports auto-parallelization of hyperops. Regardless, hyper is really only spec'd to remove the result-ordering requirements so that it may optimize it (I wouldn't be surprised if was executed in chunks in slightly more sophisticated implementations.)
From my reading of the docs, it (>>.) doesn't seem to require parallelization in the implementation, at all...it just allows it and requires code to be parallel safe, but doesn't impose any parallel requirements on the Perl 6 implementation. So, something like that Perl one-liner that would require parallelism to even work at all simply isn't valid. It might be possible to build a Perl 6 that would exhibit the right behavior, but it would be unspecified behavior.
So, I think that tiny little implementation is bunk. I'd be curious to see what the actual smallest Perl 6 implementation would look like, or if it would be the smallest Perl 5 version except using "say" to shave off two characters.
I found it to be an absolutely fascinating thread. I don't care where it came from. I learned about OpenMP (concise parallelism in C) and the Perl 6 hyper operators by reading it. That's (much) more than I learn from most Hacker News posts these days.
/prog/ is a really great board. The humor is totally weird but that's the charm of the whole thing. I think that we'd have less infamous blowhards in the online tech world if they'd just read /prog/ once in a while.
Unlike other 4chan boards, posts on /prog/ never die, and there are never any images attached to posts. So not-safe-for-work isn't usually an issue, though it is blocked for a lot of people, including me.
Sleepsort: For each item in a list of items, create a thread that sleeps as long as its number, then prints its number.
E.g. 5,3,2,1,10 makes five threads, one sleeps for one second and prints 1, one sleeps for two seconds and prints 2, one sleeps for three seconds and prints 3, one sleeps for five seconds and prints 5 and the last one sleeps for 10 seconds and prints 10.
The problem with linking to 4chan is that the threads expire so quickly that the links usually die within an hour. At Least with /b/. /prog/ is probably different.
This is from the textboards (see the dis. subdomain) not the img. , which is where /b/ and a ton of other imageboards on 4chan are at. Textboard posts never die. There are postings you can find and bump up from 2004 till now. Imageboard posts that have merit also last awhile; 60ish posts in, they just sort of last for a few hours and if they're good enough, they get archived on chanarchive.com (formerly 4chanarchive.com)
Wow 14 downvotes and counting (it's already greyed out, but thx guys! ;) for making an inconvenient but almost certainly true statement.
So here's the thing: the chan culture in general can be refreshingly great and diverse in many ways. It has cool, talented, and funny people, including fellow HN members (check out the society meta-boards sometime). I've been there for years, and am even friends with someone who admins a relatively popular subchan.
This is not some ignorant hate-based butthurt religious rant here - and I will gladly eat the signed integers to prove it... my problem with this submission is that part of the chan-sub-culture that 4chan represents and helped popularize, the part that resides at 4chan.org is a culture of desensitivity and inaction with regards to the posting of child porn. It's continually posted and often stays up for hours. It's been this way for years. I go back every once in a while hoping it's changed, but as of last week, it hasn't.
We can argue all day about about moot's level of involvement (maybe pg has a comment regarding founders setting the expectations and social norms of a community?), moderation strategy, it's associated seemingly "positive" effects on society at large (memes, hacktivism, etc.), but those are not the issue. The issue is that by posting this to HN, we are both indirectly via the network effect, and directly via advertising revenue, supporting a community with less than ideal standards relating to the sharing of pictures of sexually abused children. And yes, ad-free text boards count too due to spillover.
Next time just post the story/example or something in a Tell thread, and have the discussion here.
my 0.02$
also, here's an archive img of the linked discussion for those still at work:
That was a brilliant joke, hardest I've laughed all day. Some choice comments:
I don't like to wait 218382 seconds to sort '(0 218382)
If you slept exp(n) instead of n it could easily include
negative integers too*!*
Oh shit, we're busted, there's a REALISTICALLY THINKING
PERSON in this thread!
For those few of you not conversant with 4chan, the dis subdomain is text only, like the original 2ch. The textboards tend to be disdainful of the imageboards, and incidentally, they get much less traffic, since it's impossible to post porn on them.
The mailto:sage links in some of the posts are essentially public downvotes; from the Japanese "下げ", meaning "down", and pronounced "sah ge".
Most decent sized chans have their /prog/ board. See shii's Overchan[1] for a comprehensive list of chans on the WWW.
Shii personally runs a pretty nice chan called 4-ch[2] with a large and decent DQN[3] textboard. (S)he founded this chan after being banned from 4chan and SomethingAwful. I don't know whether shii still frequents 2ch or Futaba.
I don't consider the English chans online for programming to be all that great. Most (if not all) are textboards with few postings and lots of trolling over the last 3 years. I don't know the state of the huge Japanese ones however, as I do not frequent them or read the language.
It should be noted though that on /prog/ it is considered good manners to use sage as not to bump the thread endlessly to the top of the list. So one can tell which posts are made by outsiders by their overt... impoliteness.
Many of the posts here and on 4chan have deconstructed the algorithm and proven that it is actually O(n^2) or O(n * lg n) when you include the work done by the operating system's scheduler.
However, here's a different perspective to look at this from: What if this were implemented on an FPGA?
Let's ignore for a moment that the number of gates needed would increase linearly with the size of the input. I'll also simplify the math by assuming that a list of n numbers contains only values in the range [0 .. n] .
Let's further assume that we're only sorting positive integers, and that each job sleeps for x clock cycles (where x in the input number). We could sort a list in O(n) time.
Digging even deeper: Quicksort performs an integer comparison followed by a branch O(n lg n) times. Even if your processor can compare two integers in one clock cycle, a branch is probably going to screw up your processor's pipelining and take quite a few clock cycles to execute. So, we're possibly looking at a significant speed increase even before we consider the difference in order of growth.
So, assuming a predefined upper bound on the number of items in a list, this just might be a great way to perform hardware-assisted sorts.
For hardware based sorting, we can do even better using sorting networks[0]. Sorting networks are comprised of a system of wires, one for each input, and comparators between wires which always output the higher number on the bottom wire and the lower on the top. Using this framework we can do massively parallel sorts with excellent time complexity.
The problem is that sorting networks need to be designed for each input size, and designing efficient sorting networks is an open problem. There has been some work in the evolutionary computing community on sorting networks, and some of the best known networks for various sizes were found by EAs, but large networks are still a challenge.
I just realized: We would need a circuit polling each of the jobs to see when they finish, and I don't see how it could poll all of jobs every clock cycle. So I don't see it being possible to achieve the per-cycle resolution I suggested above.
Furthermore, the polling circuit would have to poll each of the n tasks n times, leading us finally back to a running time of O(n^2).
Reminds me of "ListBox sort" story I heard once; it was about a programmer who, wanting to sort strings and not knowing how to do this, put strings into a hidden ListBox GUI control and then read them back in sorted order.
When I was, like, 10, I wanted to find the distance between two points in a Visual Basic program but didn't know how, so I created a Line control between the points and tried to read back the Width. (It didn't work, since Width was just the thickness of the line.)
It's actually an interesting solution. If you reduced the time waited to a microsecond, then it would take about a second to sort 1M elements. Definitely not an optimal solution, and has a bit of potential for race conditions (if one thread/process got hung up on I/O), but interesting none the less.
At that resolution a modern OS is likely to round up the delay to some energy-saving number, and the run queue is going to be so contended that even if the timers fired at the right time, there would be little guarantee the processes awoke in the desired order.
I think on Linux the granularity thing is true for e.g. nanosleep(), but there is also the POSIX clock_*() functions, which I think guarantee higher resolution.
Don't forget the setup time, and the need to spawn one million threads, and then have them all wait until you've spawned the last one before they start their timers.....
As long as all the threads are running before the first one finishes... You would have to measure the speed of spawning new threads in your runtime, and adjust the linear timestep dt * x (where x is the element) for that, e.g. for 2 1 1 1 1 1 1 1 1 1 0 you would have to make the waiting time for 2 be 2 * dt + epsilon, where epsilon is a function of the elements position in the list to be sorted.
Even if you did that perfectly, it is still only a probabilistic sorting algorithm, with potential for error.
The problem, I think, is that the running time of the algorithm is measured not by the number of inputs, but by the size of the largest input—so, as Estragon (http://news.ycombinator.com/item?id=2657959) (transitively) points out, there's not an appreciable difference between sorting `(1..10^3, 10^6)` and `(1, 10^6)`.
This isn't interesting at all. every thread sleeps for that long. When you sleep, something like this happens:
The OS is told to wake up your thread after x time has elapsed. It puts this request into some data structure which it does something like poll from time to time so it knows what thread to wake up at what time. You are offloading your sort to the OS, which is probably then doing something like putting it into a linked list of timeouts that have been registered (maybe a heap?). After approximately the allotted time, some mechanism akin to a signal is used to tell your thread to wake up.
Also, usually the thread will just sleep AT LEAST as long as you tell it to sleep, but if the system is busy it could sleep a long time more. Threads can be woken up at the same time or slightly out of order, or 'woken up' only to find that all the cpu's are working on something else, and thus waiting for a time slice, getting one, then waiting again, and not getting to use stdout for weeks. It's uncommon to see machines with really high load these days, but a 1 core computer with 25000 processes all competing for CPU time, and which is swapping heavily, will potentially run each process millions of times slower than running one process.
TLDR; This is curious, but will fail much of the time, has huge memory overhead. Lots of good sorts are in place, this one requires something like 8mb per thread on linux due to the stack allocated for the thread. At best this takes millions of years potentially, and just does some other sort anyway.
I think its more of a novelty then a serious sorting algorithm. Seeing as though this comes from 4chan I think the author did it purely for the 'lulz'.
I doubt Linux actually allocates physical pages for all 8MB of available stack space immediately after launching a program. The original sort post uses bash, which launches a separate process for every external command.
However, if you're using a shared address space and kernel-level threads, you could easily run out of virtual addresses.
I used to work as an editor for freshmeat.net and we would get hilarious submissions like this quite frequently. For example, the "revolutionary new file compression algorithm" that quite literally converted a file into binary and removed all the 0s. Such a shame that the author "hadn't gotten around to" writing a de-compression tool!
We were often quite sad that we couldn't publish some of these gems, they were brilliant.
One of my less well received interview questions is: "Okay, we both know theoretical lower limits on sorting. Can you come up with a terminating algorithm that is most /pessimal/?"
Usually I get a blank stare. Sometimes I get a great answer.
If we're talking about actual sorting algorithms that aren't pathologically inefficient (i.e. if we're not allowing algorithms that deliberately make negative progress or do random stuff instead of trying to find the solution) I'd think the "pessimal" (which, TBH, I'm not 100% clear on the definition of, whether we're talking worst case runtime, worst best-case runtime, average, etc.) deterministic solution would have to be essentially to enumerate the permutation group of N elements, checking each time whether the permutation creates a sorted list or not. There are factorial(N) members in this group, which is pretty bad worst-case running time for a sort.
There are many ways to enumerate permutations, and we could probably pick a terrible one there, too, especially if we use a really naive implementation without any memoization or anything like that we could probably waste all sorts of time.
Bonus points for freshly generating the whole list of permutations fresh for each isSorted(applyPermutation(i,list)) test.
I think to do much worse you'd probably have to start cheating, throwing in garbage calculations that are merely used to waste time, not steps that anyone would actually think of using to progress towards the solution (i.e. I could certainly dream up a nasty brute-force problem to solve for the index integers used in the inner loop, but that's in the category of deliberate time wasting) - the nice thing about this solution is that it's 100% conceivable that someone that wasn't thinking straight could come up with it and implement it just like this. Hell, a poorly written Prolog would probably come close to implementing this algorithm if the sorting problem was specified in the right (wrong?) way...
There's also always SlowSort: http://c2.com/cgi/wiki?SlowSort, which works by finding the maximum element, then recursively sorting the rest of the list. That wouldn't be so terrible if it weren't for the twist: SlowSort finds each maximum element is by splitting the list into half, sorting both lists (using SlowSort, of course) to read off the maxima, and then picking the bigger one. I think this still runs faster than N!, but its runtime is guaranteed, whereas when generating permutations you might accidentally hit the right one in the first step (could always work around that by calculating all the lists first and then checking them, but again, that seems like cheating).
I guess it's a mix of optimal and pessimistic. Pessimistically optimal ("This may be optimal, but I doubt it") or Optimally pessimistic ("This is absolutely the worst possibly outcome ever") :)
We're clearly not talking about comparison sorts here, which blows the options wide open. O(keyspace) sorts (especially a naive counting-sort) are truly terrible for short lists with large keyspaces.
Absolute worst non-probabilistic for large inputs? Enumerate all possible lists given the keyspace, check if it contains the same values as the input, check if it's sorted. This is O((k^n)n!) worst-case (and I think average as well), where k is the size of the keyspace and n is the size of the input. Even if you take the obvious optimizations (enumerate only sorted lists, enumerate only lists of the correct length, etc) it's still terrible. The former improves it substantially, but not enough to finish before the heat-death of the universe on a substantial input size. The latter has no effect on the complexity.
Bogobogosort (http://www.dangermouse.net/esoteric/bogobogosort.html) would be a good candidate except that it uses random permutation and therefore might not terminate. Maybe if you replaced the random step with a deterministic one that always yielded a not-yet-seen permutation...
This is more commonly known as bogosort (http://en.wikipedia.org/wiki/Bogosort), and it's a good start for running time but isn't guaranteed to terminate. See my other comment for a link to something that goes one better than bogosort on running time.
Start by encoding each item to sort into an expression consisting of factors of prime numbers so that it's reversible. Conway chain notation of the factors would make a nice pessimistic next step, though that might violate your termination requirement as there won't be enough Planck volumes in the known universe to represent anything beyond the first few terms. :)
I don't like this question. Comparison- or value-based? Worst in the average or the best case? What's to stop me from mapping the inputs into, say, busy beaver space?
Someone posted an "algorithm" for sorting integers in bash by calling a function f, which sleeps for n seconds and then prints n. Smaller numbers will cause f to wake up and print earlier than larger numbers, so the result is that you have a sorted list print back.
Unless you have small numbers, or negative numbers, or very closely grouped numbers, or...
What's interesting is that in theory, assuming an appropriate scaling factor to divide the input by, the time and space complexity for Sleep sort are O(1). This is basically like hashing numbers, then iterating the buckets -- except that the iteration is done over time. Given that, one could envision degenerative cases for which Sleep sort could in theory outperform a standard algorithms like quicksort.
Correcting myself: while that's what I think of it conceptually (everything is placed into integer-valued "bins" via the integer parameter taken by sleep()), implementation-wise it's probably equivalent to heapsort or something similar, depending on whether the scheduler stores sleeping threads in a heap or some other data structure.
Optimize it by making the sleep time be log(x) instead of x or a fraction of x. That makes it O(log n) complexity :). Use any monotone increasing function really.
Hilarious, though not strictly correct. There is no guarantee 'sleep n' sleeps for n seconds, or that 'sleep n' wakes up before 'sleep n+1'. Granted, in reality if scheduling is so far off that things like this on the order of seconds become a problem, you've probably got bigger problems...
Are those of you whom are saying there is anything at all good or usable about this approach just trolling? Pleaaase just be trolling, if my opinion of humanity goes any lower I'll have to start building a 'compound'.
Note to future historians: In the age before commercialized internet access humans found humor in absurdism, irony, satire and sometimes even sarcasm. Today these forms are no longer considered humorous but malicious and offensive, and are now classified under the blanket term trolling.
Yikes, I guess you got me. I didn't realize that attempting to piss someone off anonymously for your own amusement was satire. Your ideas intrigue me and I am interested in subscribing to your newsletter.
Just because a sorting algorithm doesn't use any comparisons doesn't make it O(0).
For example Radix Sort (http://en.wikipedia.org/wiki/Radix_sort) is not a comparison based sort and its complexity is O(k.n) where k is the max size of elements.
my first reaction was it's going to be O(9^n) worst case - but there is & after f "$1" which spawns its own process for each number. Not bad - maybe its a good algo if all are single digits
4chan discussion aside, the concept here is that you 'sleep' n units of time for each element, where n is linearly related to the lexical relationship of the element to the other elements in the list. The 'aha' was that the resulting threads will 'wake up' or 'return' in lexical order.
Basically if you transform set L into the time domain, when collecting it back you get it back sorted.
Its a fun result, and as an exercise in systems analysis it can be enlightening to look at the impact of timer resolution, thread creation, and ordering, but ultimately the 'trick' is that you've exploited the 'insertion sort' that the kernel does on sleep events. You could try its close cousin "priority sort" where you create threads where you set the priority of each to be related to the value 'n' of the element, and all fractionally lower than the parent thread (most UNIXes are not that flexible but some realtime OSes are) then as the last step you 'yield' and the threads print out their contents in priority order and poof they are sorted. But again, the sorting happened in the kernel when they got inserted into the runq in priority order.
I was impressed by the brevity of the C solution using OpenMP (comment #81), especially compared to significantly more verbose Erlang and Haskell examples. I kept reading it, thinking, "Where's the fork?" and seeing nothing but a simple for...and then remembered that # is a preprocessor directive, rather than a comment, in C, and then googled for "parallel omp for". So, I actually learned something today from that ridiculous thread: http://en.wikipedia.org/wiki/OpenMP
Also, the Perl 6 one-liner is sort of laughably cute . I'm not sure I believe it would actually work, but if it does, the conciseness of Perl 6 is even higher than I'd realized; dropping about 9 words from the Perl 5 one-liner (which is already very small). But, I learned about the hyper operators in Perl 6, which is also cool: http://perl6advent.wordpress.com/2009/12/06/day-6-going-into...
In short, that was awesome reading over my morning tea.
There's a few short implementations of this algorithm on CodeGolf.SE: http://codegolf.stackexchange.com/questions/2722/implement-s...
I'm relieved to note that Haskell and Erlang don't have to be ridiculously verbose. There's even an Erlang one-liner that is competitive with the Perl and Ruby implementations for brevity. I was amused to note that Perl is still the shortest implementation, and could be shorter (though the frighteningly short Perl 6 implementation mentioned in the 4chan thread is either wrong, or the implementation of the <<. operator is still unfinished in Rakudo Star, because it does not actually do the job; I don't understand the hyperoperators well enough yet to know which is true).
significantly more verbose Erlang and Haskell examples
There are less verbose (but also less well-formed) solutions in these languages. Here's a working, hacky but still readable, 3-line one I just banged out for Erlang: http://pastebin.com/eZckJp9p
... and a version that trades a line for calling an actual function named sleep(), given there are rules to this silly game http://pastebin.com/heQ43K0E ...
Also, the Perl 6 one-liner is sort of laughably cute . I'm not sure I believe it would actually work
I don't think any current implementation supports auto-parallelization of hyperops. Regardless, hyper is really only spec'd to remove the result-ordering requirements so that it may optimize it (I wouldn't be surprised if was executed in chunks in slightly more sophisticated implementations.)
From my reading of the docs, it (>>.) doesn't seem to require parallelization in the implementation, at all...it just allows it and requires code to be parallel safe, but doesn't impose any parallel requirements on the Perl 6 implementation. So, something like that Perl one-liner that would require parallelism to even work at all simply isn't valid. It might be possible to build a Perl 6 that would exhibit the right behavior, but it would be unspecified behavior.
So, I think that tiny little implementation is bunk. I'd be curious to see what the actual smallest Perl 6 implementation would look like, or if it would be the smallest Perl 5 version except using "say" to shave off two characters.
4 replies →
It finally happened.
HN cut out the middle man. Instead of linking to something on Reddit about 4chan, we finally just linked straight to 4chan.
I found it to be an absolutely fascinating thread. I don't care where it came from. I learned about OpenMP (concise parallelism in C) and the Perl 6 hyper operators by reading it. That's (much) more than I learn from most Hacker News posts these days.
/prog/ is a really great board. The humor is totally weird but that's the charm of the whole thing. I think that we'd have less infamous blowhards in the online tech world if they'd just read /prog/ once in a while.
Unlike other 4chan boards, posts on /prog/ never die, and there are never any images attached to posts. So not-safe-for-work isn't usually an issue, though it is blocked for a lot of people, including me.
problem with 4chan is that it's blocked at work (probably for good reason). Now I need to find the reddit summary.
Sleepsort: For each item in a list of items, create a thread that sleeps as long as its number, then prints its number.
E.g. 5,3,2,1,10 makes five threads, one sleeps for one second and prints 1, one sleeps for two seconds and prints 2, one sleeps for three seconds and prints 3, one sleeps for five seconds and prints 5 and the last one sleeps for 10 seconds and prints 10.
The problem with linking to 4chan is that the threads expire so quickly that the links usually die within an hour. At Least with /b/. /prog/ is probably different.
This is from the textboards (see the dis. subdomain) not the img. , which is where /b/ and a ton of other imageboards on 4chan are at. Textboard posts never die. There are postings you can find and bump up from 2004 till now. Imageboard posts that have merit also last awhile; 60ish posts in, they just sort of last for a few hours and if they're good enough, they get archived on chanarchive.com (formerly 4chanarchive.com)
The sky really is falling :/
e: Dang, maybe I should have used a snark mark.
Nice, and now the HN home page is likely three clicks away from child pornography.
Wow 14 downvotes and counting (it's already greyed out, but thx guys! ;) for making an inconvenient but almost certainly true statement.
So here's the thing: the chan culture in general can be refreshingly great and diverse in many ways. It has cool, talented, and funny people, including fellow HN members (check out the society meta-boards sometime). I've been there for years, and am even friends with someone who admins a relatively popular subchan.
This is not some ignorant hate-based butthurt religious rant here - and I will gladly eat the signed integers to prove it... my problem with this submission is that part of the chan-sub-culture that 4chan represents and helped popularize, the part that resides at 4chan.org is a culture of desensitivity and inaction with regards to the posting of child porn. It's continually posted and often stays up for hours. It's been this way for years. I go back every once in a while hoping it's changed, but as of last week, it hasn't.
We can argue all day about about moot's level of involvement (maybe pg has a comment regarding founders setting the expectations and social norms of a community?), moderation strategy, it's associated seemingly "positive" effects on society at large (memes, hacktivism, etc.), but those are not the issue. The issue is that by posting this to HN, we are both indirectly via the network effect, and directly via advertising revenue, supporting a community with less than ideal standards relating to the sharing of pictures of sexually abused children. And yes, ad-free text boards count too due to spillover.
Next time just post the story/example or something in a Tell thread, and have the discussion here.
my 0.02$
also, here's an archive img of the linked discussion for those still at work:
http://i.imgur.com/LdgLc.jpg
12 replies →
That was a brilliant joke, hardest I've laughed all day. Some choice comments:
For those few of you not conversant with 4chan, the dis subdomain is text only, like the original 2ch. The textboards tend to be disdainful of the imageboards, and incidentally, they get much less traffic, since it's impossible to post porn on them.
The mailto:sage links in some of the posts are essentially public downvotes; from the Japanese "下げ", meaning "down", and pronounced "sah ge".
I consider myself conversant with 4chan, and I had no idea that the dis subdomain existed.
I always wondered if there was a chan for hackers...now I realize there is. And oh boy, what a chan it is.
Most decent sized chans have their /prog/ board. See shii's Overchan[1] for a comprehensive list of chans on the WWW.
Shii personally runs a pretty nice chan called 4-ch[2] with a large and decent DQN[3] textboard. (S)he founded this chan after being banned from 4chan and SomethingAwful. I don't know whether shii still frequents 2ch or Futaba.
I don't consider the English chans online for programming to be all that great. Most (if not all) are textboards with few postings and lots of trolling over the last 3 years. I don't know the state of the huge Japanese ones however, as I do not frequent them or read the language.
[1] http://shii.org/2ch/
[2] http://4-ch.net/4ch
[3] 4-ch.net/dqn/
2 replies →
Have you read your SICP today? You'll fit right in if you have.
1 reply →
It should be noted though that on /prog/ it is considered good manners to use sage as not to bump the thread endlessly to the top of the list. So one can tell which posts are made by outsiders by their overt... impoliteness.
Many of the posts here and on 4chan have deconstructed the algorithm and proven that it is actually O(n^2) or O(n * lg n) when you include the work done by the operating system's scheduler.
However, here's a different perspective to look at this from: What if this were implemented on an FPGA?
Let's ignore for a moment that the number of gates needed would increase linearly with the size of the input. I'll also simplify the math by assuming that a list of n numbers contains only values in the range [0 .. n] .
Let's further assume that we're only sorting positive integers, and that each job sleeps for x clock cycles (where x in the input number). We could sort a list in O(n) time.
Digging even deeper: Quicksort performs an integer comparison followed by a branch O(n lg n) times. Even if your processor can compare two integers in one clock cycle, a branch is probably going to screw up your processor's pipelining and take quite a few clock cycles to execute. So, we're possibly looking at a significant speed increase even before we consider the difference in order of growth.
So, assuming a predefined upper bound on the number of items in a list, this just might be a great way to perform hardware-assisted sorts.
Thoughts?
For hardware based sorting, we can do even better using sorting networks[0]. Sorting networks are comprised of a system of wires, one for each input, and comparators between wires which always output the higher number on the bottom wire and the lower on the top. Using this framework we can do massively parallel sorts with excellent time complexity.
The problem is that sorting networks need to be designed for each input size, and designing efficient sorting networks is an open problem. There has been some work in the evolutionary computing community on sorting networks, and some of the best known networks for various sizes were found by EAs, but large networks are still a challenge.
[0] http://en.wikipedia.org/wiki/Sorting_network
I just realized: We would need a circuit polling each of the jobs to see when they finish, and I don't see how it could poll all of jobs every clock cycle. So I don't see it being possible to achieve the per-cycle resolution I suggested above.
Furthermore, the polling circuit would have to poll each of the n tasks n times, leading us finally back to a running time of O(n^2).
Still a fun thought experiment, though.
Why would you poll the tasks? Cant the tasks wake up after their time is done, just fire an interrupt witht he number of the task.
Reminds me of "ListBox sort" story I heard once; it was about a programmer who, wanting to sort strings and not knowing how to do this, put strings into a hidden ListBox GUI control and then read them back in sorted order.
When I was, like, 10, I wanted to find the distance between two points in a Visual Basic program but didn't know how, so I created a Line control between the points and tried to read back the Width. (It didn't work, since Width was just the thickness of the line.)
It's actually an interesting solution. If you reduced the time waited to a microsecond, then it would take about a second to sort 1M elements. Definitely not an optimal solution, and has a bit of potential for race conditions (if one thread/process got hung up on I/O), but interesting none the less.
At that resolution a modern OS is likely to round up the delay to some energy-saving number, and the run queue is going to be so contended that even if the timers fired at the right time, there would be little guarantee the processes awoke in the desired order.
I think on Linux the granularity thing is true for e.g. nanosleep(), but there is also the POSIX clock_*() functions, which I think guarantee higher resolution.
higher resolution, but I don't think precision (especially in the face of I/O)
Don't forget the setup time, and the need to spawn one million threads, and then have them all wait until you've spawned the last one before they start their timers.....
_lots_ of potential for race conditions.
Easy to solve, just start them in order.
2 replies →
As long as all the threads are running before the first one finishes... You would have to measure the speed of spawning new threads in your runtime, and adjust the linear timestep dt * x (where x is the element) for that, e.g. for 2 1 1 1 1 1 1 1 1 1 0 you would have to make the waiting time for 2 be 2 * dt + epsilon, where epsilon is a function of the elements position in the list to be sorted.
Even if you did that perfectly, it is still only a probabilistic sorting algorithm, with potential for error.
The problem, I think, is that the running time of the algorithm is measured not by the number of inputs, but by the size of the largest input—so, as Estragon (http://news.ycombinator.com/item?id=2657959) (transitively) points out, there's not an appreciable difference between sorting `(1..10^3, 10^6)` and `(1, 10^6)`.
This isn't interesting at all. every thread sleeps for that long. When you sleep, something like this happens:
The OS is told to wake up your thread after x time has elapsed. It puts this request into some data structure which it does something like poll from time to time so it knows what thread to wake up at what time. You are offloading your sort to the OS, which is probably then doing something like putting it into a linked list of timeouts that have been registered (maybe a heap?). After approximately the allotted time, some mechanism akin to a signal is used to tell your thread to wake up.
Also, usually the thread will just sleep AT LEAST as long as you tell it to sleep, but if the system is busy it could sleep a long time more. Threads can be woken up at the same time or slightly out of order, or 'woken up' only to find that all the cpu's are working on something else, and thus waiting for a time slice, getting one, then waiting again, and not getting to use stdout for weeks. It's uncommon to see machines with really high load these days, but a 1 core computer with 25000 processes all competing for CPU time, and which is swapping heavily, will potentially run each process millions of times slower than running one process.
TLDR; This is curious, but will fail much of the time, has huge memory overhead. Lots of good sorts are in place, this one requires something like 8mb per thread on linux due to the stack allocated for the thread. At best this takes millions of years potentially, and just does some other sort anyway.
I think its more of a novelty then a serious sorting algorithm. Seeing as though this comes from 4chan I think the author did it purely for the 'lulz'.
You've just done a pretty great job of convincing me the opposite of your original "isn't interesting at all" thesis.
"But that's a tautology, of course the chicken wanted to get to the other side if it crossed the road."
I doubt Linux actually allocates physical pages for all 8MB of available stack space immediately after launching a program. The original sort post uses bash, which launches a separate process for every external command.
However, if you're using a shared address space and kernel-level threads, you could easily run out of virtual addresses.
8mb per thread is just constant overhead.
constant in the number of elements, sub-constant in the size of the input
4 replies →
I used to work as an editor for freshmeat.net and we would get hilarious submissions like this quite frequently. For example, the "revolutionary new file compression algorithm" that quite literally converted a file into binary and removed all the 0s. Such a shame that the author "hadn't gotten around to" writing a de-compression tool!
We were often quite sad that we couldn't publish some of these gems, they were brilliant.
One of my less well received interview questions is: "Okay, we both know theoretical lower limits on sorting. Can you come up with a terminating algorithm that is most /pessimal/?"
Usually I get a blank stare. Sometimes I get a great answer.
If we're talking about actual sorting algorithms that aren't pathologically inefficient (i.e. if we're not allowing algorithms that deliberately make negative progress or do random stuff instead of trying to find the solution) I'd think the "pessimal" (which, TBH, I'm not 100% clear on the definition of, whether we're talking worst case runtime, worst best-case runtime, average, etc.) deterministic solution would have to be essentially to enumerate the permutation group of N elements, checking each time whether the permutation creates a sorted list or not. There are factorial(N) members in this group, which is pretty bad worst-case running time for a sort.
There are many ways to enumerate permutations, and we could probably pick a terrible one there, too, especially if we use a really naive implementation without any memoization or anything like that we could probably waste all sorts of time.
Bonus points for freshly generating the whole list of permutations fresh for each isSorted(applyPermutation(i,list)) test.
I think to do much worse you'd probably have to start cheating, throwing in garbage calculations that are merely used to waste time, not steps that anyone would actually think of using to progress towards the solution (i.e. I could certainly dream up a nasty brute-force problem to solve for the index integers used in the inner loop, but that's in the category of deliberate time wasting) - the nice thing about this solution is that it's 100% conceivable that someone that wasn't thinking straight could come up with it and implement it just like this. Hell, a poorly written Prolog would probably come close to implementing this algorithm if the sorting problem was specified in the right (wrong?) way...
There's also always SlowSort: http://c2.com/cgi/wiki?SlowSort, which works by finding the maximum element, then recursively sorting the rest of the list. That wouldn't be so terrible if it weren't for the twist: SlowSort finds each maximum element is by splitting the list into half, sorting both lists (using SlowSort, of course) to read off the maxima, and then picking the bigger one. I think this still runs faster than N!, but its runtime is guaranteed, whereas when generating permutations you might accidentally hit the right one in the first step (could always work around that by calculating all the lists first and then checking them, but again, that seems like cheating).
I think you get the blank stares because half the interviewees don't know what "pessimal" means.
Cmd+Ctrl+d: no entries found
I guess it's a mix of optimal and pessimistic. Pessimistically optimal ("This may be optimal, but I doubt it") or Optimally pessimistic ("This is absolutely the worst possibly outcome ever") :)
1 reply →
We're clearly not talking about comparison sorts here, which blows the options wide open. O(keyspace) sorts (especially a naive counting-sort) are truly terrible for short lists with large keyspaces.
Absolute worst non-probabilistic for large inputs? Enumerate all possible lists given the keyspace, check if it contains the same values as the input, check if it's sorted. This is O((k^n)n!) worst-case (and I think average as well), where k is the size of the keyspace and n is the size of the input. Even if you take the obvious optimizations (enumerate only sorted lists, enumerate only lists of the correct length, etc) it's still terrible. The former improves it substantially, but not enough to finish before the heat-death of the universe on a substantial input size. The latter has no effect on the complexity.
edit: Corrected the complexity. It's even worse.
Bogobogosort (http://www.dangermouse.net/esoteric/bogobogosort.html) would be a good candidate except that it uses random permutation and therefore might not terminate. Maybe if you replaced the random step with a deterministic one that always yielded a not-yet-seen permutation...
I always like the "shaking box" algorithm:
1. Check if list ordered. If it is, we're done!
2. Randomize order of list. Go back to step 1.
N!
Worst case O(\inf), it's known as 'random sort' or 'bogosort'.
Funny thing, it can theoretically be O(1) on quantum computers, assuming the multiple-universe interpretation of quantum mechanics: http://en.wikipedia.org/wiki/Bogosort#Quantum_bogosort ;).
3 replies →
This is usually known as bogosort. http://en.wikipedia.org/wiki/Bogosort
This is more commonly known as bogosort (http://en.wikipedia.org/wiki/Bogosort), and it's a good start for running time but isn't guaranteed to terminate. See my other comment for a link to something that goes one better than bogosort on running time.
5 replies →
Start by encoding each item to sort into an expression consisting of factors of prime numbers so that it's reversible. Conway chain notation of the factors would make a nice pessimistic next step, though that might violate your termination requirement as there won't be enough Planck volumes in the known universe to represent anything beyond the first few terms. :)
The word to google is "simplexity", though that gets used for other less amusing things too, so adding the word "sort" helps. Two examples:
http://web.cs.wpi.edu/~dd/2223/broder_pessimal_algorithms.pd...
http://www.hermann-gruber.com/data/fun07-final.pdf
I don't like this question. Comparison- or value-based? Worst in the average or the best case? What's to stop me from mapping the inputs into, say, busy beaver space?
Just the fact that you object and have something interesting to say puts you ahead of ninety percent of the candidates I see.
[One guy, no foolin', told a cow-orker that a byte, on an Intel platform, contained four bits. Wow.]
1 reply →
Ask a silly question, get a silly answer:
Would someone please summarize the article?
There's no way I'm clicking a 4chan link from here at work.
It is currently SFW (it seems to have images disabled).
But, the post that started it all is:
Man, am I a genius. Check out this sorting algorithm I just invented.
It is currently SFW (it seems to have images disabled).
/prog/ is a text-only board.
1 reply →
That's actually a very clever hack. I'm not sure how I'll put it to use, but I'll definitely keep it in my mental store.
1 reply →
Someone posted an "algorithm" for sorting integers in bash by calling a function f, which sleeps for n seconds and then prints n. Smaller numbers will cause f to wake up and print earlier than larger numbers, so the result is that you have a sorted list print back.
Unless you have small numbers, or negative numbers, or very closely grouped numbers, or...
It was a joke, I think. (I hope)
What's interesting is that in theory, assuming an appropriate scaling factor to divide the input by, the time and space complexity for Sleep sort are O(1). This is basically like hashing numbers, then iterating the buckets -- except that the iteration is done over time. Given that, one could envision degenerative cases for which Sleep sort could in theory outperform a standard algorithms like quicksort.
Interesting idea I hadn't seen before. It's basically bucket sort, but using the scheduler to hold the bins.
Correcting myself: while that's what I think of it conceptually (everything is placed into integer-valued "bins" via the integer parameter taken by sleep()), implementation-wise it's probably equivalent to heapsort or something similar, depending on whether the scheduler stores sleeping threads in a heap or some other data structure.
Optimize it by making the sleep time be log(x) instead of x or a fraction of x. That makes it O(log n) complexity :). Use any monotone increasing function really.
Of course it's brilliant, it just delegates the sorting to the scheduler. The OP on 4chan is ready for a PHB position at a large corporation.
Hilarious, though not strictly correct. There is no guarantee 'sleep n' sleeps for n seconds, or that 'sleep n' wakes up before 'sleep n+1'. Granted, in reality if scheduling is so far off that things like this on the order of seconds become a problem, you've probably got bigger problems...
Guys, isn't this just another sorting problem? Aren't you just relying on the thread scheduler to sort them?
Are those of you whom are saying there is anything at all good or usable about this approach just trolling? Pleaaase just be trolling, if my opinion of humanity goes any lower I'll have to start building a 'compound'.
Note to future historians: In the age before commercialized internet access humans found humor in absurdism, irony, satire and sometimes even sarcasm. Today these forms are no longer considered humorous but malicious and offensive, and are now classified under the blanket term trolling.
Yikes, I guess you got me. I didn't realize that attempting to piss someone off anonymously for your own amusement was satire. Your ideas intrigue me and I am interested in subscribing to your newsletter.
1 reply →
I think the more interesting solutions are deeper in the thread mainly #43, #44, #83.
Mostly just for fun but its neat to see an Erlang solution since this should be right up its alley.
Please share for the net-nannied among us.
Erlang version for you:
Edit: finally got the formatting right.
javascript version for you:
Works with negative numbers.
2 replies →
Python version: http://code.activestate.com/recipes/577756-sleepsort/
I don't think this is meant to be serious.
Maybe on a quantum computer...
Wow, 4chan on HN frontpage is a milestone. I recall reading somewhere 4chan (among few other sites) articles are auto-flagged when submitted.
It's a joke, guys.
Too bad it's not a funny one
Wow. An order 0 sorting algorithm (it is O(0) not O(n) because it doesn't make _any_ comparisons). Amazing.
Just because a sorting algorithm doesn't use any comparisons doesn't make it O(0).
For example Radix Sort (http://en.wikipedia.org/wiki/Radix_sort) is not a comparison based sort and its complexity is O(k.n) where k is the max size of elements.
The OS scheduler still does the comparisons (and sorting). How otherwise would the threads wake up in order if they weren't sorted by somebody?
I dont think this sort is dependable for sorting floats in close ranges. I tried it with, ./sleepsort.bash 1 1.01 1.01 1.02
and output was
1
1.01
1.02
1.01
Clever hack, nonetheless.
my first reaction was it's going to be O(9^n) worst case - but there is & after f "$1" which spawns its own process for each number. Not bad - maybe its a good algo if all are single digits
Of course, the proper algorithm, when all are single digits, is to use an array [10]int.
Sorry, can't open the link here (blocked). May anyone please give a cached version? (or even a link to a full page screenshot, a la 4chan). - Thanks
Do you see the hidden narrative?