Comment by duskwuff
9 days ago
Tape is also an extraordinarily poor option for a service like Internet Archive which intends to provide interactive, on-demand access to its holdings.
9 days ago
Tape is also an extraordinarily poor option for a service like Internet Archive which intends to provide interactive, on-demand access to its holdings.
Back in the day, if you loaded a page from the web archive that wasn’t in cache, it’d tell you to come back in a couple of minutes. If it was in cache, it was reasonably speedy.
Cache in this case was the hard drives. If I recall correctly, we were using SAM-FS, which worked fairly well for the purpose even though it was slow as dirt —- we could effectively mount the tape drive on Solaris servers, and access the file system transparently.
Things have gotten better. I’m not sure if there were better affordable options in the late 1990s, though. I went from Alexa/IA to AltaVista, which solved the problem of storing web crawl data by being owned by DEC and installing dozens of refrigerator sized Alpha servers. Not an option open to Alexa/IA.
This is a common use for tape, which can via tools like HPSS have a couple petabytes of disk in front of it, and present the whole archive in a single POSIX filesystem namespace, handling data migration transparently and making sure hot data is kept on low-latency storage.
Yeah, it was like this (except not petabytes).
I presume backing-up the archive is a desirable thing. That's a place where I would see tape fitting well for them.
Perhaps? But unless tape, and the infrastructure to support it, is dramatically cheaper than disk, they might still be better served by more disk - having two or more copies of data on disk means that both of them can service load, whereas a tape backup is only passively useful as a backup.
This turns out to be the case, with the cost difference growing as the archive size scales. Once you hit petascale, it's not even close. However, most large-scale tape deployments also have disk involved, so it's usually not one or the other.
3 replies →
A lot of people, me included, consider anything online not to be backup. Being disconnected and completely at-rest is a very desirable property.
That's exactly what it is.
You also don't want your true backups online at all - that's the whole point.
Tape is almost always used for cold storage backups that are offline in case of ransomware attacks. Using it for on demand access would be insanely slow