πFS

3 hours ago (github.com)

Reminds me of when I tried to use the library of babel as a data compression tool. It led me down a fun rabbit hole and was my first introduction to information theory.

The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.

The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.

Related. Others?

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=6698852 - Nov 2013 (32 comments)

(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

It is worth noting that as the length of data increases it becomes extremely unlikely that the index and length of the sequence within pi would actually be smaller than the data.

Reminds me of: https://www.spronck.net/sloot.html

Further reading: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System

  • > The SDCS is only possible if keys are allowed to become infinite, or the data store is allowed to become infinite (...) This would, of course, make the idea useless.

    But Pi is infinite. And thus this genius contraption will work as long as we have Moore's law on our side :)

I vaguely remember an entry to a compression-benchmark that gamed the benchmark by treating the filename as part of the input to the decompression-algorithm, thus beating the metric that only measured the size of the file.

This is disturbing to realize that pi then contains all the past and future knowledge, including when I'll pass away.

  • So does every other random infinite sequence of bits. The unintuitive part comes from infinity, not pi.

    It also doesn't contain all past and future knowledge because it also contains all possible falsehoods about the past and future in a way that's indiscernible from the truth.

    Encoding information as an offset into a pseudorandom sequence is no more storage efficient than storing the information directly.

  • If it makes you feel better, consider that it also contains all plausible and implausible falsehoods about your demise as well.

  • The worst part is that it contains Star Wars 4-6 from an alternate timeline where Disney did a reboot casting Chris Pratt as Han Solo.

    (Fun fact: "Chrispratt" is an ancient Californian word that means "Joel McHale didn't want the role.")

  • It also contains all possible falsehoods and comes with no way to distinguish what's true from what isn't.

  • And also all the days you don’t, so, by itself not very meaningful. Especially since you can’t tell which one is right in advance. In some sense, so does a calendar

  • Fear not! It’s probably so deep in pi that you’d pass away listening to someone tell you where!

https://cs.stackexchange.com/a/53737/1704

> Matches that occur early enough in π to attain significant compression will not be varied. That is, it isn't possible to use π to compress interesting, real-world data because real-word strings are unlikely to arise early.

  • > Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.

    > Calculate the number of bits to encode that value using log2(938933556), which is ~29.8

    Can someone explain these two statements to me?

> Why is this thing so slow? It took me five minutes to store a 400 line text file!

> Well, this is just an initial prototype, and don't worry, there's always Moore's law!

Seriously? They're only storing individual bytes in pi:

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

So the whole transformation should be trivially reducible to a 256-element lookup table from source byte to location in pi and a similar table used to convert back the other way. Maybe a fancy formula could be used for the (never actually encountered) case in which a byte is encoded by one of the infinite available noncanonical encodings.

Looked at the repo but it says NOTHING about what value this project offers.

I mean, I get that it's "fun" to store information within the digits of pi. But is this just amusement, or is there a value prop for production use here?

(Speaking as a math major, by the way. I'm sympathetic to the cause.)

This is probably a dumb question, but do we actually know that pi has an infinite number of decimal digits or are we assuming that it does because we haven’t developed a sufficiently powerful computer to calculate the last digit of pi?

I’m guessing this is something that could be formally proven?

I'm intrigued that π was capitalized to Π presumably automatically in the HN headline.

  •     jshell> "πfs".toUpperCase()
        $1 ==> "ΠFS"
    
        Welcome to Node.js v26.3.0.
        Type ".help" for more information.
        > "πfs".toUpperCase()
        'ΠFS'
    
        Python 3.14.5 (main, May 10 2026, 10:21:34) [Clang 21.0.0 (clang-2100.0.123.102)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        >>> "πfs".upper()
        'ΠFS'
    
        echo 'πfs' | awk '{print toupper($0)}'
        ΠFS

Short Storage Number - SSN

0x123456789ABCDEF0

use this number as a shorter nibble storage alternative...

At what point is the metadata larger than the actual file?

  • Part of the joke is that, in this implementation, the metadata is guaranteed to be larger than the file:

    > Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

    > In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

Why would anyone need πfs, since you can already build such a system yourself quite trivially on Linux.

What a brilliant idea! Of course, of course, it’s not in the repository so I can’t apt-get install it. Debian...always so far behind.

This is interesting, but I feel like my use cases would better align with a different irrational number. Could I get an option to do this with e instead? /s