Comment by tamnd

20 hours ago

I am working on the little book of algorithms: https://github.com/little-book-of/algorithms

A project to implement 1000 algorithms. I have finished around 400 so far and I am now focusing on adding test cases, writing implementations in Python and C, and creating formal proofs in Lean.

It has been a fun way to dive deeper into how algorithms work and to see the differences between practical coding and formal reasoning. The long-term goal is to make it a solid reference and learning resource that covers correctness, performance, and theory in one place.

The project is still in its draft phase and will be heavily edited over the next few months and years as it grows and improves.

If anyone has thoughts on how to structure the proofs or improve the testing setup, I would love to hear ideas or feedback.

35 comments

tamnd

0x00cl 12 hours ago

Wow, that looks fun and probably get to learn a lot about algorithms.

I don't have any feedback, but rather a question, as I've seen many repositories with people sharing their algorithms, at least on GitHub for many different languages (e.g. https://github.com/TheAlgorithms), what did you find that was missing from those repositories that you wanted to write a book and implement hundreds of algorithms, what did you find that was lacking?

tamnd 11 hours ago

Another reason for writing the book is that many developers see "algorithms" as something only needed for FAANG interviews, not for real work. For beginners and even seniors, learning algorithms often just means doing LeetCode problems, which most people dislike but feel forced to do.
I want to change that view and show that algorithms are beautiful and useful beyond interviews. They appear everywhere, from compilers to databases to the Linux kernel, where I found many interesting data structures worth exploring. (i will share more about this topic later)
I hope to share more of these insights and connect with others who enjoy discussing real world algorithm design, which is what I love most about the Hacker News community (except for the occasional trolls that show up from time to time).
tamnd 11 hours ago

Those algorithms implement so random to me, with lack of explanation, no test cases, no formal proof, and often inconsistent naming or structure across languages. Many repositories like TheAlgorithms are great collections, but they feel more like code dumps than true learning resources. You can find an implementation of Dijkstra or QuickSort, but often there is no context: why it works, how to prove it correct, what the complexity is, or how to test it against edge cases. For someone who wants to learn algorithms deeply, that missing layer of reasoning and validation is critical.
No organization for learners either. It jumps straight into implementations without a logical flow from fundamentals. I want to build something more structured: start from the very foundation (like data structures, recursion, and complexity analysis), then move to classical algorithms (search, sort, graph, dynamic programming), and eventually extend to database internals, optimization, and even machine learning or AI algorithms. Basically, a single consistent roadmap from beginner to researcher level, where every algorithm connects to the next and builds intuition step by step.
Another very good resource for beginners is https://www.hello-algo.com. At first, i actually wanted to contribute there, since it explains algorithms visually and in simple language. But it mostly covers the basics and stops before more advanced or applied topics. I want to go deeper and treat algorithms as both code and theory, with mathematical rigor and formal proofs where possible. That is something I really liked about Introduction to Algorithms (CLRS) and of course The Art of Computer Programming (TAOCP) by Knuth. They combine reasoning, math, and practice. My goal is to make something in that spirit, but more practical and modern, bridging the gap between academic books and messy open source repos.
tamnd 11 hours ago

For more context, I actually used The Algorithms as a reference when working on my own programming language, Mochi, which includes around 150–300 algorithms (I don't remember exactly) implemented directly in Mochi. These are then transpiled to over 25 programming languages such as C, Haskell, Java, Go, Scala, and more: https://github.com/mochilang/mochi/tree/main/tests/algorithm...
The VM and transpiler were originally implemented by hand, and later I used Codex to help polish the code. The generated output works, though it is a bit messy in places. Hopefully, after finishing a few books, I can return to the project with more experience and add better use cases for it.

MASNeo 14 hours ago

Great idea. I had been thinking about pretty much the same but perhaps targeted at executives and perhaps including AI/Cloud.

I usually feel to many people wildly through around terms they hardly understand, in the belief they cannot possibly understand. That’s so wrong, every executive should understand some of what determines button line. It’s not like people skip economics because it’s hard.

Would love to perhaps contribute sometime next year. Stared and until then good luck - perhaps add a donation link!

tamnd 13 hours ago

Thanks! I completely agree. For more than ten years consulting, training and architecting systems for clients across government and enterprise, I have seen the same pattern. Long before "big data", "cloud" and now with "AI" and "GenAI" these buzzwords have often been misunderstood by most of the C-suite. In my entire career, explaining the basics and setting the right expectations has always been the hardest part.
I really like your idea of targeting executives and connecting it to real business outcomes. Getting decision makers to truly understand the fundamentals behind the technology would make a huge difference.
tamnd 13 hours ago

I hope the next generation learns to love "C" and Algorithms again. I have rediscovered my appreciation for C recently, even though Go is my main professional programming language.

kragen 19 hours ago

That's a cool project!

I feel like the presentation of Lomuto's algorithm on p.110 would be improved by moving the i++ after the swap and making the corresponding adjustments to the accesses to i outside the loop. Also mentioning that it's Lomuto's algorithm.

These comments are probably too broad in scope to be useful this late in the project, so consider them a note to myself. C as the language for presenting the algorithms has the advantage of wide availability, not sweeping performance-relevant issues like GC under the rug, and stability, but it ends up making the implementations overly monomorphic. And some data visualizations as in Sedgewick's book would also be helpful.

tamnd 16 hours ago
My biggest inspiration for this project, though, is The Art of Computer Programming (TAOCP), that level of depth and precision is the ultimate goal. I'm also planning to include formal proofs of all algorithms in Lean, though that could easily turn into a 10-year project.
- kragen 7 hours ago
  
  Python probably is not suitable for any project that you want to retain value for 10 years or more; the community has a policy against maintaining backward compatibility.
  
  4 replies →
tamnd 16 hours ago
Sedgewick's Algorithms book is great for practical learning but too tied to Java and implementation details. It is a bit shallow on theory, though the community and resources for other languages help.
That said, I personally prefer Introduction to Algorithms (CLRS) for its formal rigor and clear proofs, and Grokking Algorithms for building intuition.
The broader goal of this project is to build a well tested, reference quality set of implementations in C, Python, and Go. That is the next milestone.
- kragen 9 hours ago
  
  Possibly you mean "too tied to Pascal" ;-)
  
  6 replies →
- amelius 13 hours ago
  
  Sedgewick is available for C also.
  
  1 reply →
tamnd 16 hours ago

For visualization, I'm considering using the awesome Manim library by 3Blue1Brown (https://3b1b.github.io/manim/getting_started/quickstart.html). I'm not very good at visual design, but let see what I can come up with.
tamnd 16 hours ago
To reduce the current monomorphism, I might add a generic version using void* and a comparator, or generate code for a few key types, while keeping the simple monomorphic listings for readability. (Though this would make the code a bit more complex)
- kragen 6 hours ago
  
  Sedgewick published a C++ version of his book using templates for parametric polymorphism, but C++ is kind of a nightmare of a language. Unfortunately algorithms like quicksort really seem to demand parametric polymorphism, and the alternative languages with parametric polymorphism such as Haskell and ML mostly make memory allocation implicit, which runs counter to your goals. Rust offers both explicit memory management and parametric polymorphism, but Rust code is not very clean. Digital Mars D might be a possible alternative? But none of Rust, D, Pony, Zig, Nim, Odin, or even C++ are as widely understood as C, Python, and JS.
  Sedgewick's C edition is the book that first showed me that programs can be beautiful. His favorite algorithm is of course Quicksort. His version of the partition function from Algorithms in C (01998 edition) is, I think, worth studying:
  int partition(Item a[], int l, int r) { int i = l-1, j = r; Item v = a[r]; for (;;) { while (less(a[++i], v)) ; while (less(v, a[--j])) if (j == l) break; if (i >= j) break; exch(a[i], a[j]); } exch(a[i], a[r]); return i; }
  There are some things to dislike in this presentation, like naming a variable "l", the rather excessively horizontal layout, and the brace style. And of course he picked Hoare's original branchy partition algorithm rather than Lomuto's simpler one. But I think there are some other choices that are beneficial:
  • He's using short variable names to let the algorithm speak for itself.
  • He uses a type Item (perhaps a typedef) to make the distinction clear between the type of array elements and the type of array indices.
  • Similarly, he's using exch and less functions to abstract away the comparison and exchange operations. In the Golang standard library implementation, these are actually methods on the container to be sorted taking indices as arguments, so the concrete item type doesn't appear in the algorithm at all, which I think is an important insight. Of course the numerous
  It would be possible to use the C preprocessor to compile specialized versions of the function for particular data types and comparators in a type-safe way by redefining Item, exch, and less, but of course when you do that the error messages are difficult to decipher. (But exch can't be just a regular function, because C passes arguments by value.)
  Sedgewick's original presentation in the book (from the 01983 edition) inlines the partitioning into the Quicksort loop:
  procedure quicksort(l, r: integer) var v, t, i, j: integer; begin if r > l then begin v := a[r]; i := l-1; j := r; repeat repeat i := i+1 until a[i] >= v; repeat j := j-1 until a[j] <= v; t := a[i]; a[i] := a[j]; a[r] := t; until j <= i; a[j] := a[i]; a[i] := a[r]; a[r] := t; quicksort(l, i-l); quicksort(i+1, r) end end;
  From my perspective the C edition is an enormous improvement. I don't have a copy of the (Pascal) second edition to compare.
  The third C++ edition (01998) looks almost identical to the C version, except that it's templatized on the Item type, which in C++ also permits you to use static ad-hoc polymorphism (funciton overloading) for the exch function and the comparison operator:
  template <class Item> int partition(Item a[], int l, int r) { int i = l-1, j = r; Item v = a[r]; for (;;) { while (a[++i] < v) ; while (v < a[--j]) if (j == l) break; if (i >= j) break; exch(a[i], a[j]); } exch(a[i], a[r]); return i; }
  He illustrates the dynamic behavior of sorting algorithms in multiple ways. One is a vertical time sequence of horizontal arrays of letters, circling the pointed-to elements and graying out the unchanged ones. Another, which can scale to larger arrays, is showing a small-multiple time sequence of scatterplots, like a comic strip, where the X coordinate is the array index and the Y coordinate is the element value at that array index. (A variant of this draws bars whose heights are the element values.) From my perspective, static visualizations like this should be preferred over animation when applicable (https://worrydream.com/MagicInk/), but this case is maybe on the borderline where both animation and static visualization have unique merits.
  My own terse presentation of Lomuto's algorithm is http://canonical.org/~kragen/sw/dev3/paperalgo#addtoc_21. I don't think it reaches the clarity of an exposition in C or Python, but it is much easier to write on a whiteboard. I do think that expressing the partition subroutine in terms of an operation on a Golang-like slice improves the clarity significantly. In C or C++ you could pass a base pointer and a length; perhaps Sedgewick didn't want to expose students to this aspect of C arrays, or perhaps he was still constrained by Pascal thinking.
  
  5 replies →

teiferer 17 hours ago

Nice to see that you are still around with this after your https://news.ycombinator.com/item?id=45448525 was flagged because of LLM slop issues of your work. How are you addressing those?

vivzkestrel 18 hours ago

i am working on something pretty radical in this space. it is a book of algorithms that derives all the algorithms without telling what the algorithm is. for example for binary search your book quickly went into the low + high / 2 = mid thing. my method is radically different. i take an even sized array then try to actually find it step by step , then take an odd sized array , find it step by step, derive a general hypothesis and then create the formula from it for that algorithm. this is going to be orders of magnitude above any data structures and algorithms books and courses when it comes out. pinky promise