Comment by perbu
3 years ago
No. This is a microbenchmark. They are really great for writing blogposts that get a lot of attention and not much more.
In reality there are other operations you wanna do on your files. Concurrent access, updates, deletes are probably not as great. Backing up a database with terrabytes of binaries inside is also non-trivial. Backing up terrabytes of files is a lot simpler.
Can you elaborate why would it be simpler to backup terabytes of files instead of just one?
Not GP but one disadvantage of updating one huge file is it's harder to do efficient incremental backups. Theoretically it can still be done if your backup software supports e.g. content-defined chunking (there was a recent HN thread about Google's rsync-with-fastcdc tool). If you choose to store your assets as separate files instead though, you can trivially have incremental backups using off-the-shelf software like plain old rsync [1].
[1]: https://www.cyberciti.biz/faq/linux-unix-apple-osx-bsd-rsync...
> there was a recent HN thread about Google's rsync-with-fastcdc tool
Was this the tool & thread you mean? https://news.ycombinator.com/item?id=34303497?
2 replies →
Wow, that is actually an amazing performance curiousity adding parallelism to the mix. I guess this would depend on the M.2 spec?
If you're using 16 PCI 4.0 lanes you max out at 32GB/s, although commercial drives tends to have much lower throughput than that maximum (~7.5GB/s for a good NVMe drive). Cat6a ethernet tops out at 10 gigabits per second, but plenty of earlier versions have lower caps e.g. 1 gigabit. My guess is you'll most likely be limited by either disk or network hardware before needing CPU parallelism, if all you're doing is copying bytes from one to the other.
2 replies →