Comment by benjoffe

6 days ago

A write-up of a new Gregorian date conversion algorithm.

It achieves a 30–40% speed improvement on x86-64 and ARM64 (Apple M4 Pro) by reversing the direction of the year count and reducing the operation count (4 multiplications instead of the usual 7+).

Paper-style explanation, benchmarks on multiple architectures, and full open-source C++ implementation.

Very cool algorithm and great write-up!

I was a bit confused initially about what your algorithm actually did, until I got to the pseudo-code. Ideally there would be a high level description of what the algorithm is supposed to do before that.

Something as simple as: “a date algorithm converts a number of days elapsed since the UNIX epoch (1970-01-01) to a Gregorian calendar date consisting of day, month, and year” would help readers understand what they're about to read.

  • Thanks, that is a good idea. This was originally a blog post series, and the first article gave a bit of an introduction.

    When I started the blog series, I expected the first article to be the most noteworthy, with the 2nd and 3rd being lesser supplementary topics.

    Now that the 3rd blog post ended up with a much larger result than I was expecting, it stands on its own and could do with some editing as you suggest.

How would this algorithm change on 16-bit or 8-bit devices? Or does some variety of the traditional naïve algorithm turn out to be optimal in that case? There's quite a bit of microcontroller software that might have to do date conversions, where performance might also matter. It's also worth exploring alternative epochs and how they would affect the calculation.

  • That is an interesting question.

    It might also come into play if developing SIMD alternatives for batch date processing, as one can have more lanes with 16-bit. I plan to make a blog post covering SIMD and if 16-bit algorithms have reasonable performance then that will be covered.

Very nice writeup!

> Years are calculated backwards

How did that insight come about?

  • Thanks.

    I was fortunate enough to be programming on an ARM based device, which meant that the terms (x * 4 + 3) strongly stood out to me as highly inefficient, being 2 cycle prep for the more important division. On x64 computers, those two operations are calculated in only one operation by using the 'LEA' assembly instruction (which I wasn't aware of at the time), and so others using that type of computer might not have felt this step needed any simplification.

    I tried everything under the sun to get rid of these steps. The technique noted in the article of using the year 101 BC was for a long time my strongest candidate, you can view the implementation of that attempt at the link below [1].

    An epoch of 101 BC still meant that there was extra work required to re-normalise the timeline after the century calculation, but it was only a single addition of 365 in the calculation of `jul`. The explanation of how this works is probably a whole blog post in itself, but now that this algorithm has been discarded it's not worth the time to explain it fully.

    I also had the year-modulus-bitshift technique developed at that time, but it couldn't be integrated cleanly with any of my algorithm attempts yet. My plan was to simply document it as an interesting but slower concept.

    I don't know what sparked the idea of going backwards other than immersing myself deeply in the problem in my spare time for about a month. It finally came to me one evening, and I thought it was only going to save 1-cycle, but when it also meant the year-modulus-bitshift could be utilised, the entire thing fit together like a glove and the speed collapsed down from 20% time saving to 40%.

    [1] https://github.com/benjoffe/fast-date-benchmarks/blob/218356...