← Back to context

Comment by vbezhenar

3 years ago

In my opinion database as a filesystem was a noble goal and someone will repeat it successfully in the future. Just because MS failed at it, does not mean that this idea is not possible.

I think that we stuck in 80-s/90-s foundational tech for too long and there're many new ideas that will shape future computing.

BeFS “includes support for extended file attributes, with indexing and querying characteristics to provide functionality similar to that of a relational database.”

The BeOS file system, an OS geek retrospective | Ars Technica

[https://arstechnica.com/information-technology/2018/07/the-b...]

In a way, this is already happening. Documents are moving to the cloud, where they exist mostly in a database-like structure. End-user computers are becoming more and more a just an interface for documents that are in an online database, or (if the app supports offline) local sqlite databases.

The endgame is a machine with only an OS. The OS is where all the hard edge cases are, which prevented the db-as-fs idea from succeeding in the first place.

  • > The endgame is a machine with only an OS.

    This was tried multiple times already, and in each case, it turned out that the ability to easily store and organise data locally was both desired and required among the wider audience. Cloud storage is a nice feature to complement local storage, but will not replace it any time soon.

    • Typically this has been because file sizes have either caused network issues or storage cost issues on the server side. We are approaching (or are already there) the place where the biggest consumer file formats (videos) don’t have this constraint. At my house already it is very nebulous what is stored locally and what is stored on a remote server.

      I’m sure there is some consumer technology coming around the bend that will make local storage again compelling but I think we are on the cusp of another server centric file storage era not the reverse.n

      I can easily see a db file system being the default in for remote resources in consumer devices while traditional local file systems are reserved for specialized use cases.

      2 replies →

  • Speed of light is a hard physical limit. Local storage will always have a major edge for this reason alone.

    That said I'm now in this weird mid-life confusion about the fact that my new WAN (3gbps fiber connection) is now faster than any reasonable home LAN I can set up. Needed to move some movies from one machine to another the other day and it was way faster to just bittorrent them again from upstream than actually copy, since even USB removable media was slower than my WAN. So I think we might be at least temporarily in an awkward situation where cloud storage outcompetes LAN storage.

    But truly local storage (nVME etc) has an extreme edge over anything networked, and always will.

    • This doesn't make any sense to me, I must be missing something. If your WAN is 3Gb/s, and you LAN is less than that, how does pulling things to a local machine, presumably on the LAN and therefore limited to the LAN speed, end up being faster?

      3 replies →

    • >But truly local storage (nVME etc) has an extreme edge over anything networked, and always will.

      Sure, for specific usecases that don't much resemble what most people spend their time doing.

  • The recent popularity of Obsidian and similar knowledge management software is part of the same trend. They're databases in their own way, localized but better organized like a graph you can run queries against than a bunch of text files.

  • The endgame is a machine with only an OS.

    What does this mean?

    • In the context, it appears to intend "the strict set of what constitutes a terminal to access external storage".

      But of course, even in the context of the cloud a DB does not (necessarily?) replace a FS, and, more fundamentally, that which the poster presented looks more like a sociologistic presentation of trends (or, "big ideas" for the salesforce), oblivious of the absiological ground that "progress is not in decreased freedom", while the presented model is that of crippled machine (wanting to store locally is a basicmost demand).

If you look at the on-disk structure of NTFS, you see that it's design is rather similar to databases. Metadata and file data are treated in the same way. Every 'special' data structure, like for example the record of which disk blocks are free, is actually stored as a [hidden] file.

If a file is short, the data doesn't need a whole disk block allocated to it. Likewise, if a file name is really long, that string doesn't need to be stored together with other metadata.

I'd love to use a db for a filesystem for many of my directories.

Granted, it might just be sub-optimal in certain scenarios, but I already have to logically partition my data into big blobs that are rarely accessed and stuff that must go into the SSDs already. Maybe just giving that FS a hatch to deal with the stuff that must be special and odd/weird is ok. I don't really want the DB magic for my swap files anyway.

I'd like to see a different abstraction and UX for "files", something that feels like a database but with looser usage patterns from a user's perspective, more intuitive to use and reason about.

I don't want to remember paths and folders and all that crap. I don't want to depend on my discipline to organize and I'm willing to trade some flexibility in return.

There's some alternatives like tags, but none of them feel natural and require setup. I feel services like dropbox had a real chance to try something, but the best they came up with was showing me "recent documents" for a homepage, wasted opportunity if you ask me.

This is why I set out to build a completely new system from scratch that could perform both file system and database operations extremely well. It will let you store hundreds of millions of files in a single container (a pod); attach a multitude of contextual tags to each one; and then find subsets of those files using database-like queries (e.g. Find all JPEG photos where event = 'wedding' AND camera = 'Cannon EOS' AND size > 10 MB AND photographer = 'John'). It can find everything (even when there are millions of matches) in less than a second and even beats out Postgres and SQLite at most DB table queries.

https://www.youtube.com/watch?v=dWIo6sia_hw (file system functions) https://www.youtube.com/watch?v=Va5ZqfwQXWI (benchmark vs SQLite) https://www.youtube.com/watch?v=OVICKCkWMZE (benchmark vs Postgres)

Unfortunately I don't agree. I say unfortunately because I personally believe the knowledge model underlying a relational database system is a beautiful and fundamentally superior method of information management.

But I also have traveled enough roads in this industry to know that heterogeneity of approaches ends up defining actual practice -- for better and for worse -- and the "universality" of fopen/fread/fwrite/fclose of blobs of data is going to be hard to move away from.

I'd say that the only way that the "database" takes over the filesystem is by basically dropping to the lowest common denominator and becoming more like a filesystem; basically a key-value store. Throwing away the benefits of what a good database offers. Mainstream databases (in the form of SQL) only do a subset of what's possible with the relational model that Date&Codd laid out decades ago. I fear a move into a lower level of the stack would only worsen this.

In fact I think the trend is kind of going the other way. Filesystems are slowly adopting more and more of the storage techniques that came out of DB research, but using them for the fairly anemic storage Unix FS storage model. Because that's basically what the industry is asking for.

Further, I think there remains a poverty of knowledge about databases generally in the industry. I don't have a CS degree myself, but almost everyone I know (outside of my current employer) who does have one has said some variant of this to me: "I didn't take any/many DB classes in school." Knowledge of the relational data model is weak/thin in our industry (the low point being the "NoSQL" wave a decade ago). And within CS academia, it seems that R&D in this field is mostly centred around Europe (with the notable exception in North America being the excellent programme @ CMU.)

Disclaimer: Employed by a database company.

The practical problem is that filesystems and databases are highly optimized toward different objectives at an architectural level. You can treat one like the other but the performance will be comparatively terrible.

This is made even more difficult by the practical necessity of backward compatibility with standards like POSIX, SQL, et al in the same implementation that were never intended to be interchangeable at the implementation level.

In principle you could design a database to be used as an effective filesystem. In practice, the implementation wouldn't be very compatible with any other software you need to work with and the ability to integrate matters.

It seems like a solution looking for a problem. Indexers with file system hooks like APFS + Spotlight and the thing Haiku has seem like a more practical approach for similar results.