Comment by ChuckMcM

3 years ago

That a database is faster than a nominal file system has been known for quite a while. It can't quite replace them though. Microsoft tried really hard to have database for their root filesystem, investing a lot of time and effort but ultimately it fizzled, why? Important edge cases (like swap) that databases do really poorly.

That said, if you're using a file system abstraction for complex and compound documents using a database is a really stellar way to go. In part because it doesn't have the "chunking" problem where allocation blocks are used for both data and metadata in file systems so you pick a size that is least bad for both, versus one "optimum" size for disk io's to keep the disk channel bandwidth highly utilized and the naming/chunking part as records inside that.

I wrote a YAML <=> SQLite tool so that apps could use an efficient API to get at component types but the file could be "exported" to pure text. And it worked well in non-UTF string applications. (this was fine for the app which was orchestrating back end processes) At some point it would be interesting to move it to UTF-8 to see how that worked out.

In my opinion database as a filesystem was a noble goal and someone will repeat it successfully in the future. Just because MS failed at it, does not mean that this idea is not possible.

I think that we stuck in 80-s/90-s foundational tech for too long and there're many new ideas that will shape future computing.

  • BeFS “includes support for extended file attributes, with indexing and querying characteristics to provide functionality similar to that of a relational database.”

    The BeOS file system, an OS geek retrospective | Ars Technica

    [https://arstechnica.com/information-technology/2018/07/the-b...]

  • In a way, this is already happening. Documents are moving to the cloud, where they exist mostly in a database-like structure. End-user computers are becoming more and more a just an interface for documents that are in an online database, or (if the app supports offline) local sqlite databases.

    The endgame is a machine with only an OS. The OS is where all the hard edge cases are, which prevented the db-as-fs idea from succeeding in the first place.

    • > The endgame is a machine with only an OS.

      This was tried multiple times already, and in each case, it turned out that the ability to easily store and organise data locally was both desired and required among the wider audience. Cloud storage is a nice feature to complement local storage, but will not replace it any time soon.

      3 replies →

    • Speed of light is a hard physical limit. Local storage will always have a major edge for this reason alone.

      That said I'm now in this weird mid-life confusion about the fact that my new WAN (3gbps fiber connection) is now faster than any reasonable home LAN I can set up. Needed to move some movies from one machine to another the other day and it was way faster to just bittorrent them again from upstream than actually copy, since even USB removable media was slower than my WAN. So I think we might be at least temporarily in an awkward situation where cloud storage outcompetes LAN storage.

      But truly local storage (nVME etc) has an extreme edge over anything networked, and always will.

      5 replies →

    • The recent popularity of Obsidian and similar knowledge management software is part of the same trend. They're databases in their own way, localized but better organized like a graph you can run queries against than a bunch of text files.

  • If you look at the on-disk structure of NTFS, you see that it's design is rather similar to databases. Metadata and file data are treated in the same way. Every 'special' data structure, like for example the record of which disk blocks are free, is actually stored as a [hidden] file.

    If a file is short, the data doesn't need a whole disk block allocated to it. Likewise, if a file name is really long, that string doesn't need to be stored together with other metadata.

  • I'd love to use a db for a filesystem for many of my directories.

    Granted, it might just be sub-optimal in certain scenarios, but I already have to logically partition my data into big blobs that are rarely accessed and stuff that must go into the SSDs already. Maybe just giving that FS a hatch to deal with the stuff that must be special and odd/weird is ok. I don't really want the DB magic for my swap files anyway.

  • I'd like to see a different abstraction and UX for "files", something that feels like a database but with looser usage patterns from a user's perspective, more intuitive to use and reason about.

    I don't want to remember paths and folders and all that crap. I don't want to depend on my discipline to organize and I'm willing to trade some flexibility in return.

    There's some alternatives like tags, but none of them feel natural and require setup. I feel services like dropbox had a real chance to try something, but the best they came up with was showing me "recent documents" for a homepage, wasted opportunity if you ask me.

  • This is why I set out to build a completely new system from scratch that could perform both file system and database operations extremely well. It will let you store hundreds of millions of files in a single container (a pod); attach a multitude of contextual tags to each one; and then find subsets of those files using database-like queries (e.g. Find all JPEG photos where event = 'wedding' AND camera = 'Cannon EOS' AND size > 10 MB AND photographer = 'John'). It can find everything (even when there are millions of matches) in less than a second and even beats out Postgres and SQLite at most DB table queries.

    https://www.youtube.com/watch?v=dWIo6sia_hw (file system functions) https://www.youtube.com/watch?v=Va5ZqfwQXWI (benchmark vs SQLite) https://www.youtube.com/watch?v=OVICKCkWMZE (benchmark vs Postgres)

  • Unfortunately I don't agree. I say unfortunately because I personally believe the knowledge model underlying a relational database system is a beautiful and fundamentally superior method of information management.

    But I also have traveled enough roads in this industry to know that heterogeneity of approaches ends up defining actual practice -- for better and for worse -- and the "universality" of fopen/fread/fwrite/fclose of blobs of data is going to be hard to move away from.

    I'd say that the only way that the "database" takes over the filesystem is by basically dropping to the lowest common denominator and becoming more like a filesystem; basically a key-value store. Throwing away the benefits of what a good database offers. Mainstream databases (in the form of SQL) only do a subset of what's possible with the relational model that Date&Codd laid out decades ago. I fear a move into a lower level of the stack would only worsen this.

    In fact I think the trend is kind of going the other way. Filesystems are slowly adopting more and more of the storage techniques that came out of DB research, but using them for the fairly anemic storage Unix FS storage model. Because that's basically what the industry is asking for.

    Further, I think there remains a poverty of knowledge about databases generally in the industry. I don't have a CS degree myself, but almost everyone I know (outside of my current employer) who does have one has said some variant of this to me: "I didn't take any/many DB classes in school." Knowledge of the relational data model is weak/thin in our industry (the low point being the "NoSQL" wave a decade ago). And within CS academia, it seems that R&D in this field is mostly centred around Europe (with the notable exception in North America being the excellent programme @ CMU.)

    Disclaimer: Employed by a database company.

  • The practical problem is that filesystems and databases are highly optimized toward different objectives at an architectural level. You can treat one like the other but the performance will be comparatively terrible.

    This is made even more difficult by the practical necessity of backward compatibility with standards like POSIX, SQL, et al in the same implementation that were never intended to be interchangeable at the implementation level.

    In principle you could design a database to be used as an effective filesystem. In practice, the implementation wouldn't be very compatible with any other software you need to work with and the ability to integrate matters.

  • It seems like a solution looking for a problem. Indexers with file system hooks like APFS + Spotlight and the thing Haiku has seem like a more practical approach for similar results.

Microsoft most assuredly did not try to replace the filesystem with a database. WinFS was a relational database service that served as a central metadata store for Longhorn. It was slow, but most importantly never provided any value. The idea was that applications would store their structured data inside SQL Server cough I mean WinFS and then somehow they would be able to easily link their data. There was a complicated system of loading application schemas to accomplish this. Of course, if applications want to share data they don't need to involve a brittle system sql database.

Nothing is stopping you from storing blobs of data inside a database and then exporting a POSIX api and calling that a new and improved filesystem. But once again its hard to see what value you get from all this complexity. A filesystem is complex enough and it doesn't have to store structured data. You generally don't want the OS to handle this complexity; you have just added another failure mode to a part of the system that you really don't want to fail.

> Important edge cases (like swap) that databases do really poorly.

Wouldn't it have been possible to not use a one-size-fits all approach? I'm thinking about things like moving the swap to a dedicated partition, like on Linux, and ditto for the other edge cases.

  • In earlier Windows versions users sometimes had to fiddle with swap file size. Even on newer versions the "System determine size of swap file" may be a default option (don't have a Windows machine nearby right now). Resizing partition is much more cumbersome process than resizing a file.

    • Come on, you do it like on Unix, you set it to twice the RAM and you're done. Though these days it should probably be smaller, much smaller.

I don't think replacing the root FS with a database is a good idea. There are two options here:

1. Make it a database with schemas. In that case, you'll never be able to change or extend the language used to define schemas, since every application and OS will need to keep supporting older versions. It will also be hard to have multiple competing implementations of a complete database system, particularly because applications will come to rely on the performance characteristics of the reference implementation.

2. Make it schemaless, like a key-value store. That basically just is the file system, but non-heirarchical and with a faster implementation. In that case, your DB could just be a faster implementation of the existing filesystem interface or a natural extension thereof. The hierarchical nature of the filesystem is fairly essential if you want to allow multiple applications to avoid trampling on each other.

  • This pretty succinctly captures it.

    A file system is, at its heart, a naming system. Given this name, return me a handle to "do things" with that object. In SQL it is "SELECT * FROM FILES WHERE ( NAME == <name> );". The next debate is what the schema for the files table?

    Well known things that would appear in that schema are things like "access rights", "ownership", "access times", Etc. Additionally related things like the "consuming application" might be there, and the "editing application" Etc.

    If these things were all relations/schema then a lot of "warts" like MAGICNUMBER or extension type, security certificates, integrity digests, OS requirement, symbol tables, or character encoding can then become schema elements and the names get "purified" (scare quotes because architecture astronauts really get all hot and bothered by overloading things with multiple semantics.

    Then you can do "views" like when someone types a name at the shell you can select from files that are executable on this OS and match the names.

    There are basically a whole bunch of things that are "grafted on" to the file system and they can (and do) get out of date relative to the files and cause problems. (the canonical cases are moving an executable from one directory to another making it no longer visible for execution, changing the extension or magic number resulting in the wrong thing being used to try to run/edit it)

    The architecture reasoning goes, "If there was just one source of truth about these things, a database, then a whole bunch of bugs and user annoyances would vanish."

    Anyway, I've witnessed people start down this grand vision, devolve into your #2 above (key/value store) and then finally throw their hands up in surrender.

    One of the systems questions I ponder sometimes is at what point is memory so cheap and plentiful that replacing the buffer cache in the kernel with a giant interconnected schema like this gives equivalent "time to record" values. An RDMA accessible victim cache for "blob" buffering might help too.

    A couple of people have taken runs at it as "object storage" or "storage as a service" but it isn't quite there yet.

    Still, for a complex file format (say MPEG 4) having it be a data base gives you some advantages and makes writing file component parsers unnecessary. So that's a win.

Q: Isn’t the filesystem of a system a sort of database? Not in the sense of a query back-and-forth, but in the sense of holding all file records and locations.

  • The cool part about DBs is interleaving storing/manipulating and querying. A raw hard drive is already a DB by just allow you to write to them in that sense, but no one would consider them a decent DB.

    • Not all databases are relational, even though many are. Filesystems are definitely strongly overlapping with what you'd consider a document-oriented NoSQL database.

      The underlying datastructures are really the same in a filesystem and a database in every way that matters.

    • But you can store, manipulate, and query data in a filesystem. That's what it's for.

A filesystem is a database. It is not relational and has a different API but the main difference is in different design goals and different set of tradeoffs.

What is the backend for the storage buckets we see in cloud (and increasingly, HPC) computing? Seems like a database filesystem would be a good fit there.

  • MinIO has millions of installations according to the documentation. If this is the building block in your favorite cloud is mostly hidden and unknown to users.

I don't understand how this is possible, as a database is an abstraction on top of the filesystem. Perhaps for certain workloads, it may hit RAM more frequently than disk, making it faster. But for large binary blobs, Sqlite is way slower than the file system on Linux.