It's possible to run this query against the full github dataset but I couldn't figure out how to pay for it, so if somebody wants to do that it would be excellent.
just a note: it's bizarre that I absolutely cannot find a way to determine a) how much it would cost to run or b) how I would pay for it if I wanted to run it
I changed it to query from [bigquery-public-data:github_repos.contents] instead, and before I execute the query it says "Valid: This query will process 1.68 TB when run.".
Since the Java source is open, its all there to be peer-reviewed. If a paper its based on isn't the best you can make some noise about it. This is a good situation for Java.
Some more found by a quick grep for "et al.", "Proceedings", "Proc. ", "Symposium", "Conference", "Conf. ", "PPoPP" (a conference with an easy-to-grep-for name), and "acm.org":
hotspot/src/cpu/ppc/vm/ppc.ad: See J.M.Tendler et al. "Power4 system microarchitecture", IBM J. Res. & Dev., No. 1, Jan. 2002.
hotspot/src/cpu/x86/vm/crc32c.h: V. Gopal et al. / Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction April 2011 8
hotspot/src/share/vm/gc/shared/taskqueue.hpp: Le, N. M., Pop, A., Cohen A., and Nardell, F. Z.: Correct and efficient work-stealing for weak memory models Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2013), 69-80
jdk/src/java.base/share/classes/java/util/Arrays.java: Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993
jdk/src/jdk.crypto.ec/share/native/libsunec/impl/mpmontg.c: "A Cryptogrpahic Library for the Motorola DSP56000" by Stephen R. Dusse' and Burton S. Kaliski Jr. published in "Advances in Cryptology: Proceedings of EUROCRYPT '90, LNCS volume 473, 1991, pg 230-244
hotspot/src/share/vm/opto/superword.hpp: "Exploiting SuperWord Level Parallelism with Multimedia Instruction Sets" by Samuel Larsen and Saman Amarasinghe [...] published in ACM SIGPLAN Notices, Proceedings of ACM PLDI '00, Volume 35 Issue 5
jdk/src/java.base/share/classes/java/util/SplittableRandom.java: Leiserson, Schardl, and Sukha "Deterministic Parallel Random-Number Generation for Dynamic-Multithreading Platforms", PPoPP 2012
jdk/src/java.base/share/classes/java/util/SplittableRandom.java: "Parallel random numbers: as easy as 1, 2, 3" by Salmon, Morae, Dror, and Shaw, SC 2011
jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Dynamic Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005
jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Idempotent work stealing" by Michael, Saraswat, and Vechev, PPoPP 2009
jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Leapfrogging: a portable technique for implementing efficient futures" by D.B. Wagner and B.G. Calder, PPoPP '93, http://dl.acm.org/citation.cfm?id=155354
jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: Using elimination to implement scalable and lock-free FIFO queues, Moir et al, http://portal.acm.org/citation.cfm?id=1074013
jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: "Bounding space usage of conservative garbage collectors", HJ Boehm, http://portal.acm.org/citation.cfm?doid=503272.503282 (this is the Boehm GC paper)
jdk/src/java.base/share/classes/java/util/concurrent/locks/StampedLock.java: Design, verification and applications of a new read-write lock algorithm, Shirako et al, SPAA 2012
hotspot/src/share/vm/opto/escape.hpp: Jong-Deok Shoi, Manish Gupta, Mauricio Seffano, Vugranam C. Sreedhar, Sam Midkiff: "Escape Analysis for Java", Procedings of ACM SIGPLAN OOPSLA Conference, November 1, 1999
hotspot/src/share/vm/runtime/os.cpp: Gilad Bracha and David Ungar: "Mirrors: Design Principles for Meta-level Facilities of Object-Oriented Programming Languages", in Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications, October 2004
jdk/src/jdk.crypto.ec/share/native/libsunec/impl/ec_naf.c: D. Hankerson, J. Hernandez and A. Menezes, "Software implementation of elliptic curve cryptography over binary fields", Proc. CHES 2000
jdk/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java: "Nonblocking Concurrent Objects with Condition Synchronization", by W. N. Scherer III and M. L. Scott. 18th Annual Conf. on Distributed Computing, Oct. 2004
Ah, sorry, I didn't really check for dupes---I just skipped the ones with a pdf link in the vicinity. I'm just glad that sometimes the clever things that academics churn out are actually used in practice. Far too rarely if you ask me, but I'm biased of course ;)
I had to cite sources while implementing an artificial immune system (real valued negative selection and clonal selection algorithms). I read through a few papers for each algorithm and cited the clearest one as a source.
Seconded! I really like this compilation. Very interesting to see the algorithms and data structures behind the implementation of a language, especially one of the more popular ones.
To be fair, sometimes code and comments get moved around, and any of us can use grep (or whatever other search tool you prefer) to find a specific link in the source.
About 99% of Linux (or even more) is drivers. But indeed there should be useful references in the scheduler, locking primitives, memory management and core networking code.
Here's a google bigquery that lists the most common PDFs referenced in the github sample dataset, and the top 100 results: https://gist.github.com/llimllib/3f1877eab06208958060f491cf3...
It's possible to run this query against the full github dataset but I couldn't figure out how to pay for it, so if somebody wants to do that it would be excellent.
just a note: it's bizarre that I absolutely cannot find a way to determine a) how much it would cost to run or b) how I would pay for it if I wanted to run it
I changed it to query from [bigquery-public-data:github_repos.contents] instead, and before I execute the query it says "Valid: This query will process 1.68 TB when run.".
Queries are $5/TB [0].
So a bit less than 10 bucks. :)
Edit: brb, that's totally worth it.
[0]: https://cloud.google.com/bigquery/pricing
5 replies →
Since the Java source is open, its all there to be peer-reviewed. If a paper its based on isn't the best you can make some noise about it. This is a good situation for Java.
Some more found by a quick grep for "et al.", "Proceedings", "Proc. ", "Symposium", "Conference", "Conf. ", "PPoPP" (a conference with an easy-to-grep-for name), and "acm.org":
hotspot/src/cpu/ppc/vm/ppc.ad: See J.M.Tendler et al. "Power4 system microarchitecture", IBM J. Res. & Dev., No. 1, Jan. 2002.
hotspot/src/cpu/x86/vm/crc32c.h: V. Gopal et al. / Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction April 2011 8
hotspot/src/share/vm/gc/shared/taskqueue.hpp: Le, N. M., Pop, A., Cohen A., and Nardell, F. Z.: Correct and efficient work-stealing for weak memory models Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2013), 69-80
jdk/src/java.base/share/classes/java/util/Arrays.java: Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993
jdk/src/jdk.crypto.ec/share/native/libsunec/impl/mpmontg.c: "A Cryptogrpahic Library for the Motorola DSP56000" by Stephen R. Dusse' and Burton S. Kaliski Jr. published in "Advances in Cryptology: Proceedings of EUROCRYPT '90, LNCS volume 473, 1991, pg 230-244
hotspot/src/share/vm/opto/superword.hpp: "Exploiting SuperWord Level Parallelism with Multimedia Instruction Sets" by Samuel Larsen and Saman Amarasinghe [...] published in ACM SIGPLAN Notices, Proceedings of ACM PLDI '00, Volume 35 Issue 5
jdk/src/java.base/share/classes/java/util/SplittableRandom.java: Leiserson, Schardl, and Sukha "Deterministic Parallel Random-Number Generation for Dynamic-Multithreading Platforms", PPoPP 2012
jdk/src/java.base/share/classes/java/util/SplittableRandom.java: "Parallel random numbers: as easy as 1, 2, 3" by Salmon, Morae, Dror, and Shaw, SC 2011
jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Dynamic Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005
jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Idempotent work stealing" by Michael, Saraswat, and Vechev, PPoPP 2009
jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Leapfrogging: a portable technique for implementing efficient futures" by D.B. Wagner and B.G. Calder, PPoPP '93, http://dl.acm.org/citation.cfm?id=155354
jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: Using elimination to implement scalable and lock-free FIFO queues, Moir et al, http://portal.acm.org/citation.cfm?id=1074013
jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: "Bounding space usage of conservative garbage collectors", HJ Boehm, http://portal.acm.org/citation.cfm?doid=503272.503282 (this is the Boehm GC paper)
jdk/src/java.base/share/classes/java/util/concurrent/locks/StampedLock.java: Design, verification and applications of a new read-write lock algorithm, Shirako et al, SPAA 2012
hotspot/src/share/vm/opto/escape.hpp: Jong-Deok Shoi, Manish Gupta, Mauricio Seffano, Vugranam C. Sreedhar, Sam Midkiff: "Escape Analysis for Java", Procedings of ACM SIGPLAN OOPSLA Conference, November 1, 1999
hotspot/src/share/vm/runtime/os.cpp: Gilad Bracha and David Ungar: "Mirrors: Design Principles for Meta-level Facilities of Object-Oriented Programming Languages", in Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications, October 2004
jdk/src/jdk.crypto.ec/share/native/libsunec/impl/ec_naf.c: D. Hankerson, J. Hernandez and A. Menezes, "Software implementation of elliptic curve cryptography over binary fields", Proc. CHES 2000
jdk/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java: "Nonblocking Concurrent Objects with Condition Synchronization", by W. N. Scherer III and M. L. Scott. 18th Annual Conf. on Distributed Computing, Oct. 2004
That's a lot :) Some of your findings are actually listed in the original article, but not all of them obviously.
Ah, sorry, I didn't really check for dupes---I just skipped the ones with a pdf link in the vicinity. I'm just glad that sometimes the clever things that academics churn out are actually used in practice. Far too rarely if you ask me, but I'm biased of course ;)
I had to cite sources while implementing an artificial immune system (real valued negative selection and clonal selection algorithms). I read through a few papers for each algorithm and cited the clearest one as a source.
it would be great if it also mentioned which files the links were found in
Seconded! I really like this compilation. Very interesting to see the algorithms and data structures behind the implementation of a language, especially one of the more popular ones.
You can just grep by PDF name/url and find the code.
To be fair, sometimes code and comments get moved around, and any of us can use grep (or whatever other search tool you prefer) to find a specific link in the source.
Please, do it for the Linux source code.
About 99% of Linux (or even more) is drivers. But indeed there should be useful references in the scheduler, locking primitives, memory management and core networking code.
To be more precise, its actually a list of scientific papers referenced in the OpenJDK source code.
... as direct pdf links found via grep.
There might be more references without a pdf link.
I'm surprised the author didn't search for DOI links.
4 replies →
Thanks, we've updated the title to clarify.
Are you saying you can't run a PDF? :)