Wow, this part makes my blood boil, emphasis mine:
> This issue doesn't affect tapes written with the ADR-50 drive, but all the tapes I have tested written with the OnStream SC-50 do NOT restore from tape unless the PC which wrote the tape is the PC which restores the tape. This is because the PC which writes the tape stores a catalog of tape information such as tape file listing locally, which the ARCserve is supposed to be able to restore without the catalog because it's something which only the PC which wrote the backup has, defeating the purpose of a backup.
Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.
Companies do shitty things and programmers write bad code, but this one really takes the prize. I can only imagine someone inexperienced wrote the code, nobody ever did code review, and then the company only ever tested reading tapes from the same computer that wrote them, because it never occured to them to do otherwise?
> Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.
What is needed is the backup catalog. This is fairly standard on a lot of tape-related software, even open source; see for example "Bacula Tape Restore Without Database":
When I was still doing tape backups the (commercial) backup software we were using would e-mail us the bootstrap information daily in case we had to do a from-scratch data centre restore.
The first step would get a base OS going, then install the backup software, then import the catalog. From there you can restore everything else. (The software in question allowed restores even without a license (key?), so that even if you lost that, you could still get going.)
Obviously to know what to restore, you need to index the data on the tapes. Tape is not a random access medium, there is no way around this.
This is only for a complete disaster scenario, if you’re restoring one PC or one file, you would still have the backup server and the database. But if you don’t, you need to run the command to reconstruct the database.
Storing the catalogue on the PC is standard. But being able to rebuild that catalogue from scratch is also standard. I’ve not used any tapes before now where you couldn’t recover the catalogue.
This type of thing a surprisingly common mistake, I've come across it several times in industry.
An example of this done right: If you disconnect a SAN volume from VMware and attach it to a completely different cluster, it's readable. You can see the VM configs and disks in named folders. This can be used for DR scenarios, PRD->TST clones, etc...
Done wrong: XenServer. If you move a SAN volume to a new cluster, it gets shredded, with every file name replaced by a GUID instead. The file GUID to display name mapping is stored in a database that's only on the hosts! That database is replicated host-to-host and can become corrupted. Backing up just the SAN arrays is not enough!
I’d like to believe maybe that’s why the company went out of business but that’s just wishful thinking - a lot of incompetence is often ignored if not outright rewarded in business nowadays. Regardless, it’s at least somewhat of a consolation those idiots did go out of business in the end, even if that’s wasn’t the root cause.
I'm familiar with needing to re-index a backup if it's accessed from a 'foreign' machine and sometimes the procedure is non-obvious but just not having that option seems pretty bad.
I worked for an MSP a million years ago and we had a customer that thought they had lost everything. They had backup tapes but the backup server itself had died, after showing them the 'catalog tape' operation, and keeping their fingers crossed for a few hours, they bought me many beers.
That is not terribly surprising. The cheap tape drives of that era were very picky like that. Even if I had the same tape drive as a friend it was not always certain that I could read back my tape on his box and the other way around. These drives were very affordable and the tapes were low priced as well. However, they were really designed for the 'oh no I messed up my computer let me restore it' or 'I deleted a file I should not have' scenarios. Not server side backup rotation solutions. Unless that was the backup/restore computer. Long term storage or off site type drives were decently more pricy.
My guess is they lack the RAM buffer and CPU to properly keep up correctly. Then with a side of assumptions on the software side.
this does not sound like a junior programmer error. this is not the kind of thing companies let inexperienced people come up with at least not on their own. this is a lack of testing. any real world backup test would have caught this. and i would expect the more senior engineers of the project to ensure this was covered
If you’re making an “ought to be” argument, I agree.
If you’re making an “is” argument, I completely disagree. I see companies (including my own) regularly having junior programmers responsible for decisions that cast inadvertently or unexpectedly long shadows.
I guess that's why the .zip format chucks its catalog index at the end of the archive. But it's still unnatural to use in a streaming format like tapes though.
In The Singularity Is Near (2005) Ray Kurzweil discussed an idea for the “Document Image and Storage Invention”, or DAISI for short, but concluded it wouldn't work out. I interviewed him a few years later about this and here's what he said:
The big challenge, which I think is actually important almost philosophical challenge — it might sound like a dull issue, like how do you format a database, so you can retrieve information, that sounds pretty technical. The real key issue is that software formats are constantly changing.
People say, “well, gee, if we could backup our brains,” and I talk about how that will be feasible some decades from now. Then the digital version of you could be immortal, but software doesn’t live forever, in fact it doesn’t live very long at all if you don’t care about it if you don’t continually update it to new formats.
Try going back 20 years to some old formats, some old programming language. Try resuscitating some information on some PDP1 magnetic tapes. I mean even if you could get the hardware to work, the software formats are completely alien and [using] a different operating system and nobody is there to support these formats anymore. And that continues. There is this continual change in how that information is formatted.
I think this is actually fundamentally a philosophical issue. I don’t think there’s any technical solution to it. Information actually will die if you don’t continually update it. Which means, it will die if you don’t care about it. ...
We do use standard formats, and the standard formats are continually changed, and the formats are not always backwards compatible. It’s a nice goal, but it actually doesn’t work.
I have in fact electronic information that in fact goes back through many different computer systems. Some of it now I cannot access. In theory I could, or with enough effort, find people to decipher it, but it’s not readily accessible. The more backwards you go, the more of a challenge it becomes.
And despite the goal of maintaining standards, or maintaining forward compatibility, or backwards compatibility, it doesn’t really work out that way. Maybe we will improve that. Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.
So ironically, the most primitive formats are the ones that are easiest.
In 2005 the computing world was much more in flux than it is now.
PNG is 26 years old and basically unchanged since then. Same with 30 year old JPEG, or for those with more advanced needs the 36 year old TIFF (though there is a newer 21 year old revision). All three have stood the test of time against countless technologically superior formats by virtue of their ubiquity and the value of interoperability. The same could be said about 34 year old zip or 30 year old gzip. For executable code, the wine-supported subset of PE/WIN32 seems to be with us for the foreseeable future, even as Windows slowly drops comparability.
The latest Office365 Word version still supports opening Word97 files as well as the slightly older WordPerfect 5 files, not to mention 36 year old RTF files. HTML1.0 is 30 years old and is still supported by modern browsers. PDF has also got constant updates, but I suspect 29 year old PDF files would still display fine.
In 2005 you could look back 15 years and see a completely different computing landscape with different file formats. Look back 15 years today and not that much changed. Lots of exciting new competitors as always (webp, avif, zstd) but only time will tell whether they will earn a place among the others or go the way of JPEG2000 and RAR. But if you store something today in a format that's survived the last 25 years, you have good chances to still be able to open it in common software 50 years down the line.
This is too shortsighted by the archival standards. Even Word itself doesn't offer full compatibility. VB? 3rd party active components? Other Office software integration? It's a mess. HTML and other web formats are only readable by the virtue of being constantly evolved while keeping the backwards compatibility, which is nowhere near complete and is hardware-dependent (e.g. aspect ratios, colors, pixel densities). The standards will be pruned sooner or later, due to the tech debt or being sidestepped by something else. And I'm pretty sure there are plenty of obscure PDF features that will prevent many documents from being readable in mere half a century. I'm not even starting on the code and binaries. And cloud storage is simply extremely volatile by nature.
Even 50 years (laughable for a clay tablet) is still pretty darn long in the tech world. We'll still probably see the entire computing landscape, including the underlying hardware, changing fundamentally in 50 years.
Future-proofing anything is a completely different dimension. You have to provide the independent way to bootstrap, without relying on the unbroken chain of software standards, business/legal entities, and the public demand in certain hardware platforms/architectures. This is unfeasible for the vast majority of knowledge/artifacts, so you also have to have a good mechanism to separate signal from noise and to transform volatile formats like JPEG or machine-executable code into more or less future proof representations, at least basic descriptions of what the notable thing did and what impact it had.
While it's true that these standards are X years old, the software that encoded those formats yesteryear is very different from the software that decodes it today. It's a Ship of Theseus problem. They can claim an unbroken lineage since the distant future, the year 2000, but encoders and decoders had defects and opinions that were relied on--both intentionally and unintentionally--that are different from the defects and opinions of today.
I have JPEGs and MP3s from 20 years ago that don't open today.
"The roots of Creo Parametric. Probably one of the last running installations of PTC's revolutionary Pro/ENGINEER Release 7 datecode 9135 installed from tape. Release 7 was released in 1991 and is - as all versions of Pro/ENGINEER - fully parametric. Files created with this version can still - directly - be opened and edited in Creo Parametric 5.0 (currently the latest version for production).
This is a raw video, no edits, and shows a bit of the original interface (menu manager, blue background, yellow and red datum planes, no modeltree).
I can't help but think bad thoughts whenever I see another "static site maker" posted on here, or a brand new way of using JavaScript to render a web page.
Talk about taking the simplest and most durable of (web) formats and creating a hellscape of tangled complexity which becomes less and less likely to be maintainable or easy to archive the more layers of hipster js faddishness you add...
One of the claimed benefits of the JVM (and obviously later VMs) was that it would solve this issue: Java programs written in 2000 should still be able to run in 2100. And as far as I know the JVM has continued to fulfill this promise.
An honest question: If you are writing a program that you want to survive for 100+ years, shouldn't you specifically target a well-maintained and well-documented VM that has backward compatibility as a top priority? What other options are there?
People routinely boot DOS in e.g. qemu. The x86 ISA is 45 years old, older if you consider the 8008/8080 part of the lineage. It's not pretty, but it's probably the most widespread backwards compatible system out there.
While I love the JVM, and I also think it's one of the better runtimes in terms is backwards compatibility, there have been breakages. Most of the ones I've dealt with were easy to fix. But the ease of fixing is related to the access to source code. When something in a data stream is broken, be it an MP3 or a JPEG, I guess you almost inherently need special tooling to fix it (realistically). I imagine that with an SVG it'd be easier to hand-fix it.
> An honest question: If you are writing a program that you want to survive for 100+ years, shouldn't you specifically target a well-maintained and well-documented VM that has backward compatibility as a top priority? What other options are there?
I'd be tempted to target a well-known physical machine - build a bootable image of some sort as a unikernel - although in the age of VMWare etc. there's not a huge difference.
IMO the "right" way to do this would be to endow an institution to keep the program running, including keeping it updated to the "live" version of the language it's writen in, or even porting it between languages as and when that becomes necessary.
But he seems to have written this before virtual machines became widespread.
I think the concern is becoming increasingly irrelevant now, because if I really need to access a file I created in Word 4.0 for the Mac back in 1990, it's not too hard to fire up System 6 with that version of Word and read my file. In fact it's much easier now than it was in 2005 when he was writing. Sure it might take half an hour to get it all working, but that's really not too bad.
Most of this is probably technically illegal and will sometimes even have to rely on cracked versions, but also nobody cares and. All the OS's and programs are still around and easy to find on the internet.
Not to mention that while file formats changed all the time early on, these days they're remarkably long-lived -- used for decades, not years.
The outdated hardware concern was more of a concern (as the original post illustrates), but so much of everything important we create today is in the cloud. It's ultimately being saved in redundant copies on something like S3 or Dropbox or Drive or similar, that are kept up to date. As older hardware dies, the bits are moved to newer hardware without the user even knowing.
So the problem Kurzweil talked about has basically become less of an issue as time has marched on, not more. Which is kind of nice!
>I think the concern is becoming increasingly irrelevant now, because if I really need to access a file I created in Word 4.0 for the Mac back in 1990, it's not too hard to fire up System 6 with that version of Word and read my file. In fact it's much easier now than it was in 2005 when he was writing. Sure it might take half an hour to get it all working, but that's really not too bad.
> I think the concern is becoming increasingly irrelevant
I fear we may be on top of that point. With the "cloudification" where more and more software is run on servers one doesn't control there is no way to run that software in a VM as you don't have access to the software anymore. And even getting the pure data for a custom backup becomes harder and harder.
I'm certain that 100 years from now, when the collapse really gets rolling, we'll still have cuneiform clay tablets complaining about Ea-Nassir's shitty copper but most of the digital information and culture we've created and tried to archive will be lost forever. Eventually, we're going to lose the infrastructure and knowledge base we need to keep updating everything, people will be too busy just trying to find food and fighting off mutants from the badlands to care.
Well, almost all early tablets are destroyed or otherwise lost now. Do you think we will lose virtually all digital age information within a century? Maybe from a massive CME, I suppose.
I was able to backup/restore an old COBOL system via cpio between modern GNU cpio (man page last updated June 2018), and SCO's cpio (c. 1989). This is neither to affirm nor contradict Kurzweil, but rather to praise the GNU userland for its solid legacy support.
This is very very true. I have archived a number of books and magazines that were scanned and converted into "simplified" PDF, and archived on a DVD disks with C source code.
There are external dependencies but one hopes that the descriptions are sufficient to figure out how to make those work.
Actually I'd argue it's wrong precisely because we do manage to retrieve even such old artifacts. Only problem is that nobody cared for 30 years so the process was harder than it should be but in the end it was possible.
Sure, there is a risk that at some point, for example, any version of every PNG or H.264 decoder gets lost and so re-creating decoder for that would be significantly more complicated, but chances for that are pretty slim, but looking at `ffmpeg -codecs` I'm not really worried for that to ever happen.
> Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.
Honestly backup space is weirdly sparse for anything on enterprise scale.
For anything more than few machines there is bacula/bareos (that pretends everything is tape with mostly miserable results), backuppc (that pretends tapes are not a thing, with miserable results), and that's about it, everything else seems to be point-to-point backups only with no real central management.
Are talking about open source only? Because there are loads of options available. Veritas has two products (netbackup and backupexec). There is also commvault, veeam, ibm spectrum protect and hp data protector. Admittedly only netbackup and commvault are what I would truly call enterprise, but your options are certainly not limited.
I've actually long been stunned by the propensity of proprietary backup software to use undocumented, proprietary formats. I've always found this quite stunning, in fact. It seems to me like the first thing one should make sure to solve when designing a backup format is to ensure it can be read in the future even if all copies of the backup software are lost.
I may be wrong but I think some open source tape backup software (Amanda, I think?) does the right thing and actually starts its backup format with emergency restoration instructions in ASCII. I really like this kind of "Dear future civilization, if you are reading this..." approach.
Frankly nobody should agree to use a backup system which generates output in a proprietary and undocumented format, but also I want a pony...
It's interesting to note that the suitability of file formats for archiving is also a specialised field of consideration. I recall some article by someone investigating this very issue who argued formats like .xz or similar weren't very suited to archiving. Relevant concerns include, how screwed you are if the archive is partly corrupted, for example. The more sophisticated your compression algorithm (and thus the more state it records from longer before a given block), the more a single bit flip can result in massive amounts of run-on data corruption, so better compression essentially makes things worse if you assume some amount of data might be damaged. You also have the option of adding parity data to allow for some recovery from damage, of course. Though as this article shows, it seems like all of this is nothing compared to the challenge of ensuring you'll even be able to read the media at all in the future.
At some point the design lifespan of the proprietary ASICs in these tape drives will presumably just expire(?). I don't know what will happen then. Maybe people will start using advanced FPGAs to reverse engineer the tape format and read the signals off, but the amount of effort to do that would be astronomical, far more even than the amazing effort the author here went to.
To add, thinking a bit more about it: Designing formats to be understandable by future civilizations actually reduces to a surprising degree to the same set of problems which METI has to face. As in, sending signals designed to be intelligible to extraterrestrials - Carl Sagan's Contact, etc.
Even if you write an ASCII message directly to a tape, that data is obviously going to be encoded before being written to the tape, and you have no idea if anyone will be able to figure out that encoding in future. Trouble.
What makes this particularly pernicious is the fact that LTO nowadays is a proprietary format(!!). I believe the spec for the first generation or two of LTO might be available, but last I checked, it's been proprietary for some time. The spec is only available to the (very small) consortium of companies which make the drives and media. And the number of companies which make the drives is now... two, I think? (They're often rebadged.) Wouldn't surprise me to see it drop to one in the future.
This seems to make LTO a very untrustworthy format for archiving, which is deeply unfortunate.
Make an LTO tape... But also make a Bluray... And also store it on some hard drives... And also upload it to a web archive...
The same for the actual file format... Upload PDF's... But also upload word documents.. And also ASCII...
And same for the location... Try to get diversity of continents... Diversity of geopolitics (ie. some in USA, some in Russia). Diversity of custodians (friends, businesses, charities).
Even ASCII itself is a strange encoding that could be lost with enough time and need to be recovered through cryptographic analysis and signals processing. That doesn't look at all likely today given UTF-8's promised and mostly accomplished ubiquity and its permanent grandfathering of ASCII. But ASCII is still only one of a number of potential encoding schemes, isn't necessarily obvious from first principles.
Past generations thought EBCDIC would last longer than it did.
Again, not that there any indications now that ASCII won't survive nearly as long as the English language does at this point, just that when we're talking about sending signals to the future, even assuming ASCII encoding is an assumption to question.
These things make more sense because LTO is used for backup, not archival. Companies don't want to be able to read the tape data in 50 years, they want to be able to read it tomorrow, after the entire business campus burns down.
Yeah. If I ever wrote a backup system I'd do this too, write the whole spec for the format first to every medium. A 100k specification describing the format is nothing to waste on a medium which can store 10TB.
It's kinda strange that we still don't have a technology that would allow one to scan a magnetic medium at high resolution and then process it in software. This would be nice for all kinds of things that use magnetic tapes and platters — data recovery, perfect analog tape digitization, etc. The closest I've seen to it is that project that captures the raw signal from the video head of a VCR and then decodes it into a picture.
Isn't there a subset of that at least for floppy discs with Kryoflux or GreaseWeazle style controllers? They read the raw flux transitions off the drive head, and then it's up to software to figure out that it's a Commodore GCR disc or a Kaypro MFM one.
I've always admired the tenacity of people who reverse engineer stuff. To be able to spend multiple months figuring out barely documented technologies with no promise of success takes a lot a willpower and discipline. It's something I wish I could improve more in myself.
I think you could. In some sense "easily". It may be about finding that thing you're naturally so interested in or otherwise drawn to, that the months figuring out become a type of driven joy, and so the willpower kinda automatic.
And if you find it, don't judge what it is or worry what others might think - or even necessarily tell anyone. Sometimes the most motivating things are highly personal, as with the OP; a significant part of their childhood.
You definitely have a point there, looking at some of my previous work I was able to stick to projects for many months if I found the work interesting. I'll have to admit in the past 5 or so years any time I've tried to start a project there was always the thought in the back of my mind of 'will this benefit my career' or 'how can I make money on this in the future'. It seems having such thoughts adds additional anxiety to whenever I try and start to work on something for fun.
Looks like that is what I need to start looking for again, projects which I find interesting or fun to do in my spare time, without thinking about how it would affect my career or trying to find ways to monetize it.
Fascinating read that unlocked some childhood memories.
I'm secondhand pissed at the recovery company, I have a couple of ancient SD cards laying around and this just reinforces my fear that if I send them away for recovery they'll be destroyed (the cards aren't recognized/readable by the readers built into MacBooks, at least)
My understanding is that flash memory does not do very well at all for long term unpowered data retention. flash memory is basically a capacitor(it is not really a capacitor but close to one) and will loose it's charge after a few years.
And magnetic drives will seize up. and optical disks get oxidized, and tapes stick together. long term archiving is a tricky endeavor.
It is however an interesting challenge. I think I would get acquainted with the low level protocol used by sd cards. then modify a microcontroller sdmmc driver to get me an image of the card(errors and all). that is, without all the scsi disk layers doing their best to give you normalized access to the device. Or more realistically, hope someone more talented than me does the above.
Tapes hold up really well if they're not in absolutely awful storage conditions. And the claim at least was that the early CD-ROMs were quite durable, being a straight up laser carved diffraction grating. CDRs on the other hand rely on dye which will degrade rapidly.
> My understanding is that flash memory does not do very well at all for long term unpowered data retention
You need to let flash cells rest before writing again if you want to achieve long retention periods, see section 4 in [1]. The same document says 100 years is expected if you only cycle it once a day, 10k times over 20 years (Fig 8).
Last year I helped a friend recover photos from a portable WD HDD. It was formatted in FAT32 and I was forced to run R-Studio to get reliable results. There was a lot of damaged (readable, with artifacts) and corrupted (doesn't render, have wrong size) files.
This is giving me some anxiety about my tape backups.
I have backed up my blu-ray collection to a dozen or so LTO-6 tapes, and it's worked great, but I have no idea how long the drives are going to last for, and how easy it will be to repair them either.
Granted, the LTO format is probably one of the more popular formats, but articles like this still keep me up at night.
The only surefire method to keep the bits readable is to continue moving them onto new media every few years. Data has a built-in recurring cost. I'd love to see a solution to that problem but I think it's unlikely. It's a least possible, though, that we'll come up with a storage medium with sufficient density and durability that'll it'll be good enough.
I don't even want to think about the hairy issues associated with keeping the bits able to be interpreted. That's a human behavior problem more than a technology problem.
LTO is one of the best choices for compatibility. I remember just how awful DDS (same sort of media as DAT) tape backups were - due to differences in head alignments, it was a real lottery as to whether any given tape could be read on a different drive than the one that wrote it.
https://news.ycombinator.com/item?id=36062785 had been edited to censor the information, so I'm dupicating it here). Caveat that I don't know if that's actually correct, since efforts to suppress it are only circusantial evidence in favor.
> Over the span of about a month, I received very infrequent and vague communications from the company despite me providing extremely detailed technical information and questions.
Ahh the business model of "just tell them to send us the tape and we'll buy the drive on eBay"
To be honest as long as they are very careful about not doing any damage to the original media then it might work and be a win-win for both sides in a "no fix no fee" model where the customer only pays if the data is successfully recovered.
Their cardinal sin was that they irreparably damaged the tape without prior customer approval.
The OP explicity didn't name them (despite many people recommending to, even preservationists in this field on Reddit and Discord) but it's easy to find just by googling the text on the screenshots
>> The tape was the only backup for those things, and it completes Frogger 2's development archives, which will be released publicly.
In cases like this can imagine some company yelling "copyright infringement" even though they don't possess a copy themselves. It's a really odd situation.
As a kid, I got this game as a gift and really, really wanted to play it. But after beating the second level, the game would always crash on my computer with an Illegal Operation exception. I remember sending a crash report to the developer, and even updating the computer, but I never got it working.
I work in the tape restoration space. My biggest piece of advice is never NEVER encrypt your tapes. If you think restoring data from an unknown format tape is hard, trying to do it when the drive will not let you read the blocks off the tape without a long lost decryption key is impossible.
TIL there are three completely different games named "Frogger 2" I assumed this was for the 1984 game, but this is for the 2000 game (there is also a 2008 game).
> the ADR-50e drive was advertised as compatible, but there was a cave-at
I'm assuming the use of "cave-at" means the author has inferred an etymology of "caveat" being made up of "cave" and "at", as in: this guarantee has a limit beyond which we cannot keep our promises, if we ever find ourselves AT that point then we're going to CAVE. (As in cave in, meaning give up.) I can't think of any other explanation of the odd punctuation. Really quite charming, I'm sure I've made similar inferences in the past and ended up spelling or pronouncing a word completely wrong until I found out where it really comes from. There's an introverted cosiness to this kind of usage, like someone who has gained a whole load of knowledge and vocabulary from quietly reading books without having someone else around to speak things out loud.
It's not just limited to tape, most archiving and backup software is proprietary.
It's impossible to open Acronis or Macrium Reflect images without their Windows software. In Acronis's case they even make it impossible to use offline or on a server OS without paying for a license.
NTBackup is awfully slow and doesn't work past Vista, and it's not even part of XP POSReady for whatever reason, so I had to rip the exe from a XP ISO and unpack it (NTBACKUP._EX... I forgot microsoft's term for that) because the Vista version available on Microsoft's site specifically checks for longhorn or vista.
Then there's slightly more obscure formats that didn't take off in the western world, and the physical mediums too. Not many people had the pleasure of having to extract hundreds of "GCA" files off of MO disks using obscure Japanese freeware from 2002. The English version of the software even has a bunch of flags on virustotal that the standard one doesn't. And there's obscure LZH compression algorithms that no tool available now can handle.
I've found myself setting up one-time Windows 2000/XP VMs just to access backups made after 2000.
I have at various times considered a tape backup solution for my home, but always give up when it seems every tape vendor is only interested in business clients. It was a race to stay ahead of hard drives and oftentimes they seemed to be losing. The price points were clearly aimed at business customers, especially on the larger capacity tapes. In the end I do backup to hard drives instead because it's much cheaper and faster.
ARCServe was a Computer Associates product. That's all you need to know.
It had a great reputation on Novell Netware but the Windows product was a mess. I never had a piece of backup management software cause blue screens (e.g. kernel panics) before an unfortunate Customer introduced me to ARCServe on Windows.
My favorite ArcServe bug which they released a patch for (and which didn’t actually fix the issue, as I recall) had a KB article called something along the lines of “The Open Database Backup Agent for Lotus Notes Cannot Backup Open Databases”.
IIRC tar has some Unixisms that don't necessarily work for Windows/NTFS. Not saying reinventing tar is appropriate but there's Windows/NTFS that a Windows based tape backup need to support.
Most of what makes NTFS different than FAT probably doesn't need to be backed up. Complex ACLs, alternative data streams, shadow copies, etc, are largely irrelevant when it comes to making a backup. Just a simple warning "The data being backed up includes alternative data streams. These aren't supported and won't be included in the backup" would suffice.
Vendor lock in for backup and archival products is so ridiculous. It increases R&D to ensure the lock-in, and the company won't exist by the time the lock-in takes effect.
Is there way to read magnetic tapes like these in such a way as to get the raw magnetic flux at high resolution?
It seems like it would be easier to process old magnetic tapes by imaging them and then applying signal processing rather than finding working tape drives with functioning rollers. Most of the time, you're not worried about tape speed since you're just doing recovery read rather than read/write operations. So, a slow but accurate operation seems like it would be a boon for these kinds of things.
For anybody who is into this this is a a good excuse to share a presentation from Vintage Computer Fest West 2020 re: magnetic tape restoration: https://www.youtube.com/watch?v=sKvwjYwvN2U
The presentation explores using software-defined signal processing analyze a digitized version of the analog signal generated from the flux transitions. It's basically moving the digital portion of the tape drive into software (a lot like software-defined radio). This is also very similar to efforts in floppy disk preservation. Floppies are amazingly like tape drives, just with tiny circular tapes.
OP here! Yes I'd highly recommend this video, I stumbled across it early on when trying to familiarize myself with what the options were-- and it's a good video!
At the very least, and the cost for this perhaps would be prohibitive, but some mechanism to duplicate the raw flux off the tape onto another tape in an identical format, a backup of the backup. This would allow for attempts to read the data that may be potentially destructive to the media (for example, breaking the tape accidentally) and not lose the original signal.
Sounds like at least in this case that ASIC in the drive was doing some (non trivial) signal processing. Would be interesting to know how hard it would be to get from the flux pattern back to zeros and ones. I guess with a working drive you can at least write as many test patterns as you want until you maybe figure it out.
At the very least the drive needs to be able to lock onto the signal. It's probably encoded in a helix on the drive and if the head isn't synchronized properly you won't get anything useful, even with a high sampling rate.
You still need to know where to look, the format, and using specialized equipment which cost wasn't driven down by mass manufacturing, so, in theory yes, in practice not.
(Completely guessing here with absolute no knowledge of the real state of things)
Yes. There’s some guy on YouTube who does stuff like that (he reverse engineered the audio recordings from a 747 tape array) but it can be quite complicated.
F2 was a really neat game. It almost invented Crypt of the Necrodancer’s genre decades early.
It’s a little sad that it took such a monumental effort to bring the source code back from the brink of loss. It’s times like that that should inspire lawmakers to void copyright in the case that the copyright holders can’t produce the thing they’re claiming copyright over.
Heh, I remember playing .mp3 files directly from QIC-80 tapes, somewhere around 1996. One tape could store about 120 MB, which is equal to about two compact discs' worth of audio. The noise of the tape drive was slightly annoying, though. And it made me appreciate what the 't' in 'tar' stands for.
No, it was really only 120 MB. I was referring to the length of an audio compact disc, not the capacity of a CD-ROM. At 128 kbps, you'd get about 2 hours of play time.
Of course it didn't really make sense to use digital tapes for that use case, even back then. It was just for fun, and the article sparked some nostalgic joy, which felt worth sharing :)
They reference MP3, and a CD ripped down to MP3 probably fits in the 50-100MB envelope for size. It has been a very long time since I last ripped an album, but that size jives with my memory.
This is just random, but reading this and the backup discussion made me think about SGI IRIX and how it could do incremental backups.
One option was to specify a set of files, and that spec could just be a directory. Once done, the system built a mini filesystem and would write that to tape.
XFS was the filesystem in use at the time I was doing systems level archival.
On restores, each tape, each record was a complete filesystem.
One could do it in place and literally see the whole filesystem build up and change as each record was added. Or, restore to an empty directory and you get whatever was in that record.
That decision was not as information dense as others could be, but it was nice and as easy as it was robust.
What our team did to back up some data managed engineering software was perform a full system backup every week, maybe two. Then incrementals every day, written twice to the tape.
Over time, full backups were made and sent off site. One made on a fresh tape, another made on a tape that needed to be cycled out of the system before it aged out. New, fresh tapes entered the cycle every time one aged out.
Restores were done to temp storage and rather than try and get a specific file, it was almost always easier to just restore the whole filesystem and then copy the desired file from there into its home location. The incrementals were not huge usually. Once in a while they got really big due to some maintenance type operation touching a ton of files.
The nifty thing was no real need for a catalog. All one needed was the date to know which tapes were needed.
Given the date, grab the tapes, run a script and go get coffee and then talk to the user needing data recovery to better understand what might be needed. Most of the time the tapes were read and the partial filesystem was sitting there ready to go right about the time those processes completed.
Having each archive, even if it were a single file, contain a filesystem data set was really easy to use and manage. Loved it.
A few months ago I was looking for an external backup drive and thought that SSD would be great because it's fast and shock resistant. Years ago I killed a Macbook Pro HD by throwing it on my bed from few inches high. Then I read a comment on Amazon about SSD losing information when unpowered for a long time. I couldn't find any quick confirmation in the product page, took me a few hours of research to find some paper about this phenomenon. If I remember correctly it takes a few weeks for the stored SSD to start losing its data. So I bought a mechanical HD.
Another tech tip is not buying 2 backup devices from the same batch or even the same model. Chances being these will fail in the same way.
To the last bit, I've seen this first hand. Had a whole RAID array of the infamous IBM DeathStar drives fail one after the other while we frantically copied data off.
Last time I ever had the same model drives in an array.
Heh, I remember in the early 1990s having a RAID array with a bunch of 4Gb IBM drives come up dead after a weekend powerdown for a physical move due to "stiction". I was on the phone with IBM, and they were telling me to physically bang the drives on the edge of desk to loosen them up. Didn't seem to be working, so their advice was "hit it harder!" When I protested, they said, "hey, it already doesn't work, what have you got to lose?" So I hit it harder. Eventually got enough drives to start up to get the array on line, and you better believe the first thing I did after that was create a fresh backup (not that we didn't have a recent backup anyway), and the 2nd thing I did was replace those drives, iirc, with Seagate Barracudas.
When I was still relatively familiar with flash memory technologies (in particular NAND flash, the type used in SSDs and USB drives), the retention specs were something like 10 years at 20C after 100K cycles for SLC, and 5 years at 20C after 5-10K cycles for MLC. The more flash is worn, the leakier it becomes. I believe the "few weeks" number for modern TLC/QLC flash, but I suspect that is still after the specified endurance has been reached. In theory, if you only write to the flash once, then the retention should still be many decades.
Indeed. The paper everyone gets the "flash loses its data in a few years" claim from wasn't dealing with consumer flash and consumer use patterns. Remember that having the drive powered up wouldn't stop that kind of degradation without explicitly reading and re-writing the data. Surely you have a file on an SSD somewhere that hasn't been re-written in several years, go check yourself whether it's still good.
Even the utter trash that is thumb drives and SD cards seem to hold data just fine for many years in actual use.
IIRC, the paper was explicitly about heavily used and abused storage.
Only if you kept the disk in a refrigerator. Bits are stored by melting the plastic slightly and the dye seeping in. Over time, the warmth of "room temperature" will cause the pits to become less well-defined so the decoder has to spend more time calculating "well, is that really a 1 or is it a sloppy 0". There's a lot of error detection/correction built into the CD specs, but eventually, there will be more error than can be corrected for. If you've ever heard the term "annealing" when used in machine learning, this is equivalent.
Living in South Florida, ambient temperatures were enough to erase CD-Rs - typically in less than a year. I quickly started buying the much more expensive "archival" discs, but that wasn't enough. One fascinating "garage band" sold their music on CD-Rs and all of my discs died (it was a surfer band from Alabama).
The recording is made in the dye layer, a chemical change, and the dye degrades (particularly in sunlight) so the discs have a limited shelf life. Checking Wikipedia, it appears azo dye formulations can be good for tens of years.
Melting polycarbonate would call for an absurdly powerful laser, a glacial pace, or both, and you wouldn't have to use dye at all. I'd guess such a scheme would be extremely durable, though.
I tried a couple of CD-Rs that were stored in a dry closed drawer for most of the last 20 years recently, and they seemed to at least initially come up. Now I can reinstall Windows 2000 with Service Pack 4 slipstreamed!
....What do you mean "nobody paid for the bucket for last 5 years" ?
There is some chance someone might stash old hard drive or tape with backup somewhere in the closet. There is no chance there will be anything left when someone stops paying for cloud.
I'm pretty sure that even with the substantial damage done by the recovery company, a professional team like Kroll Ontrack can still recover the complete tape data, although it probably won't be cheap.
As the other comment here says, any company claiming to do data recovery, and damaging the original media to that extent, should be named and shamed. I can believe that DR companies have generic drives and heads to read tapes of any format they come across, but even if they couldn't figure out how the data was encoded, there was absolutely no need to cut and splice the tape. I suspect they did that just out of anger at not likely being able to recover anything (and thus having spent a bunch of time for no profit.)
Melted pinch rollers are not uncommon and there are plenty of other (mostly audio) equipment with similar problems and solutions --- dimensions are not absolutely critical and suitable replacements/substitutes are available.
As an aside, I think that prominent "50 Gigabytes" capacity on the tape cartridge, with a small asterisk-note at the bottom saying "Assumes 2:1 compression", should be outlawed as a deceptive marketing practice. It's a good thing HDD and other storage media didn't go down that route.
Name and shame the company, you had a personal experience, you have proof. Name and shame. It helps nobody if you don't publicize it. Let them defend it, let them say whatever excuse, but your review will stand.
I don't want to even remotely tempt them to sue. They have no grounds, but I'm not taking risks-- companies are notorious for suing when they know they'll lose. Others who have posted it here have identified the right company though.
This is a masterful recovery effort. The README should be shared as an object lesson far and wide to every data restoration and archival service around.
I’ve been suffering through something similar with a DLT IV tape from 1999. Luckily I didn’t send out to the data recovery company. But still unsuccessful.
Nice catch! I think it was a little less juvenile than it might sound. I believe this was for a different game, Fusion Frenzy, which was a party minigame collection.
While I didn't understand the parent you are replying to (not your answer), your mention of Fusion Frenzy caught my eye. I've had a soft spot for that game since spending hours playing the "xbox magazine" demo with a childhood friend. Could you clarify? Is there any history gem about that one? I'd dig a PC port!
Yet, despite ARCserve showing a popup which says "Restoration Successful", it restores up to the first 32KB of every file on the tape, but NO MORE."
From 10,000 feet, this sounds suspiciously like ARCserve is reading a single tape block or transfer buffer's worth of data for each file, writing out the result, then failing and proceeding to the next file.
Success popup notwithstanding, I'd expect to find errors in either the ARCserve or Windows event logs in this case — were there none?
While it's been decades since I've dealt with ARCserve specifically, I've seen similar behavior caused by any number of things. Off the top of my head,
(1) Incompatibilities between OS / backup software / HBA driver / tape driver.
In particular, if you're using a version of Windows much newer than Windows 2000, try a newer version of ARCserve.
In the absence of specific guidance, I'd probably start with the second* ARCserve version that officially supports Windows Server 2003:
(a) Server 2003 made changes to the SCSI driver architecture that may not be 100% compatible with older software.
(b) The second release will likely fix any serious Server 2003 feature-related bugs the first compatible version may have shipped with, without needing to install post-release patches that may be hard to find today.
(b) Significantly newer ARCserve versions are more likely to introduce tape drive / tape format incompatibilities of their own.
(2) Backup software or HBA driver settings incompatible with the hardware configuration (e.g., if ARCserve allows it, try reducing the tape drive transfer buffer size or switching from fixed block (= multiple tape blocks per transfer) to variable block (= single tape block per transfer) mode; if using an Adaptec HBA, try increasing the value of /MAXIMUMSGLIST[1]).
(3) Shitty modern HBA driver support for tape (and, more generally, non-disk) devices.
For example, modern Adaptec Windows HBA drivers have trouble with large tape block sizes that AFAIK cannot be resolved with configuration changes (though 32 kB blocks, as likely seen here, should be fine).
In my experience with PCIe SCSI HBAs, LSI adapters are more likely to work with arbitrary non-disk devices and software out-of-the-box, whereas Adaptec HBAs often require registry tweaks for "unusual" circumstances (large transfer sizes; concurrent I/O to >>2 tape devices; using passthrough to support devices that lack Windows drivers, especially older, pre-SCSI 2 devices), assuming they can be made to work at all.
LSI20320IE PCIe adapters are readily available for $50 or less on eBay and, in my experience, work well for most "legacy" applications.
(To be fair to Adaptec, I've had nothing but good experiences using their adapters for "typical" applications: arbitrary disk I/O, tape backup to popular drive types, CD/DVD-R applications not involving concurrent I/O to many targets, etc.)
(4) Misconfigured or otherwise flaky SCSI bus.
In particular, if you're connecting a tape drive with a narrow (50-pin) SCSI interface to a wide (68-pin) port on the HBA, make sure the entire bus, including the unused pins, are properly terminated.
The easiest way to ensure this is to use a standard 68-pin Ultra320 cable with built-in active LVD/SE termination, make sure termination is enabled on the HBA, disabled on the drive, that the opposite end of the cable from the built-in terminator is connected to the HBA, and, ideally, that the 68-to-50-pin adapter you're using to connect the drive to the cable is unterminated.
You can also use a 50-pin cable connected to the HBA through a 68-to-50-pin adapter, but then you're either relying on the drive properly terminating the bus — which it may or may not do — or else you need an additional (50-pin) terminator for the drive end, which will probably cost as much as a Ultra320 cable with built-in termination (because the latter is a bog-standard part that was commonly bundled with both systems and retail HBA kits).
Note that I have seen cases where an incorrect SCSI cable configuration works fine in one application, but fails spectacularly in another, seemingly similar application, or even the same application if the HBA manages to negotiate a faster transfer mode. While this should be far less likely to occur with a modern Ultra160 or Ultra320 HBA, assume nothing until you're certain the bus configuration is to spec (and if you're using an Ultra2 or lower HBA, consider replacing it).
With all that said, reversing the tape format may well be easier than finding a compatible OS / ARCserve / driver / HBA combination.
In any case, good job with that, and thanks for publishing source code!
On a related note, I own a few older tape drives[1], have access to many more[2], and would be happy to volunteer my time and equipment to small-scale hobbyist / retrocomputing projects such as this — tape format conversions were a considerable part of my day job for several years, and tape drives are now a minor hobby.
See my profile for contact information.
[1] 9-track reel, IBM 3570, IBM 3590, early IBM 3592, early LTO, DLT ranging from TK-50 to DLT8000.
[2] IBM 3480/3490/3490E, most 4mm and 8mm formats, most full-sized QIC formats including HP 9144/9145, several QIC MC/Travan drives with floppy controllers of some description, a Benchmark DLT1 assuming it still works, probably a few others I'm forgetting about.
> This is where the story should probably have stopped. Given up and called it a day, right? Maybe, but I care about this data, and I happen to know a thing or two about computers.
Wow, this part makes my blood boil, emphasis mine:
> This issue doesn't affect tapes written with the ADR-50 drive, but all the tapes I have tested written with the OnStream SC-50 do NOT restore from tape unless the PC which wrote the tape is the PC which restores the tape. This is because the PC which writes the tape stores a catalog of tape information such as tape file listing locally, which the ARCserve is supposed to be able to restore without the catalog because it's something which only the PC which wrote the backup has, defeating the purpose of a backup.
Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.
Companies do shitty things and programmers write bad code, but this one really takes the prize. I can only imagine someone inexperienced wrote the code, nobody ever did code review, and then the company only ever tested reading tapes from the same computer that wrote them, because it never occured to them to do otherwise?
But yikes.
> Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.
What is needed is the backup catalog. This is fairly standard on a lot of tape-related software, even open source; see for example "Bacula Tape Restore Without Database":
* http://www.dayaro.com/?p=122
When I was still doing tape backups the (commercial) backup software we were using would e-mail us the bootstrap information daily in case we had to do a from-scratch data centre restore.
The first step would get a base OS going, then install the backup software, then import the catalog. From there you can restore everything else. (The software in question allowed restores even without a license (key?), so that even if you lost that, you could still get going.)
Right, the on-PC database act as index to data on the tape. That's pretty standard.
But having format where you can't recreate the index from data easily is just abhorrently bad coding...
Obviously to know what to restore, you need to index the data on the tapes. Tape is not a random access medium, there is no way around this.
This is only for a complete disaster scenario, if you’re restoring one PC or one file, you would still have the backup server and the database. But if you don’t, you need to run the command to reconstruct the database.
11 replies →
Wouldn't it make sense to also write the backup catalog to the tape though? Seems like a very obvious thing to do to me.
8 replies →
Storing the catalogue on the PC is standard. But being able to rebuild that catalogue from scratch is also standard. I’ve not used any tapes before now where you couldn’t recover the catalogue.
This type of thing a surprisingly common mistake, I've come across it several times in industry.
An example of this done right: If you disconnect a SAN volume from VMware and attach it to a completely different cluster, it's readable. You can see the VM configs and disks in named folders. This can be used for DR scenarios, PRD->TST clones, etc...
Done wrong: XenServer. If you move a SAN volume to a new cluster, it gets shredded, with every file name replaced by a GUID instead. The file GUID to display name mapping is stored in a database that's only on the hosts! That database is replicated host-to-host and can become corrupted. Backing up just the SAN arrays is not enough!
I’d like to believe maybe that’s why the company went out of business but that’s just wishful thinking - a lot of incompetence is often ignored if not outright rewarded in business nowadays. Regardless, it’s at least somewhat of a consolation those idiots did go out of business in the end, even if that’s wasn’t the root cause.
I'm familiar with needing to re-index a backup if it's accessed from a 'foreign' machine and sometimes the procedure is non-obvious but just not having that option seems pretty bad.
I worked for an MSP a million years ago and we had a customer that thought they had lost everything. They had backup tapes but the backup server itself had died, after showing them the 'catalog tape' operation, and keeping their fingers crossed for a few hours, they bought me many beers.
1 reply →
That is not terribly surprising. The cheap tape drives of that era were very picky like that. Even if I had the same tape drive as a friend it was not always certain that I could read back my tape on his box and the other way around. These drives were very affordable and the tapes were low priced as well. However, they were really designed for the 'oh no I messed up my computer let me restore it' or 'I deleted a file I should not have' scenarios. Not server side backup rotation solutions. Unless that was the backup/restore computer. Long term storage or off site type drives were decently more pricy.
My guess is they lack the RAM buffer and CPU to properly keep up correctly. Then with a side of assumptions on the software side.
this does not sound like a junior programmer error. this is not the kind of thing companies let inexperienced people come up with at least not on their own. this is a lack of testing. any real world backup test would have caught this. and i would expect the more senior engineers of the project to ensure this was covered
If you’re making an “ought to be” argument, I agree.
If you’re making an “is” argument, I completely disagree. I see companies (including my own) regularly having junior programmers responsible for decisions that cast inadvertently or unexpectedly long shadows.
It's basically an index stored on faster media. You would have redundancy on that media, too.
I guess that's why the .zip format chucks its catalog index at the end of the archive. But it's still unnatural to use in a streaming format like tapes though.
[dead]
In The Singularity Is Near (2005) Ray Kurzweil discussed an idea for the “Document Image and Storage Invention”, or DAISI for short, but concluded it wouldn't work out. I interviewed him a few years later about this and here's what he said:
The big challenge, which I think is actually important almost philosophical challenge — it might sound like a dull issue, like how do you format a database, so you can retrieve information, that sounds pretty technical. The real key issue is that software formats are constantly changing.
People say, “well, gee, if we could backup our brains,” and I talk about how that will be feasible some decades from now. Then the digital version of you could be immortal, but software doesn’t live forever, in fact it doesn’t live very long at all if you don’t care about it if you don’t continually update it to new formats.
Try going back 20 years to some old formats, some old programming language. Try resuscitating some information on some PDP1 magnetic tapes. I mean even if you could get the hardware to work, the software formats are completely alien and [using] a different operating system and nobody is there to support these formats anymore. And that continues. There is this continual change in how that information is formatted.
I think this is actually fundamentally a philosophical issue. I don’t think there’s any technical solution to it. Information actually will die if you don’t continually update it. Which means, it will die if you don’t care about it. ...
We do use standard formats, and the standard formats are continually changed, and the formats are not always backwards compatible. It’s a nice goal, but it actually doesn’t work.
I have in fact electronic information that in fact goes back through many different computer systems. Some of it now I cannot access. In theory I could, or with enough effort, find people to decipher it, but it’s not readily accessible. The more backwards you go, the more of a challenge it becomes.
And despite the goal of maintaining standards, or maintaining forward compatibility, or backwards compatibility, it doesn’t really work out that way. Maybe we will improve that. Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.
So ironically, the most primitive formats are the ones that are easiest.
In 2005 the computing world was much more in flux than it is now.
PNG is 26 years old and basically unchanged since then. Same with 30 year old JPEG, or for those with more advanced needs the 36 year old TIFF (though there is a newer 21 year old revision). All three have stood the test of time against countless technologically superior formats by virtue of their ubiquity and the value of interoperability. The same could be said about 34 year old zip or 30 year old gzip. For executable code, the wine-supported subset of PE/WIN32 seems to be with us for the foreseeable future, even as Windows slowly drops comparability.
The latest Office365 Word version still supports opening Word97 files as well as the slightly older WordPerfect 5 files, not to mention 36 year old RTF files. HTML1.0 is 30 years old and is still supported by modern browsers. PDF has also got constant updates, but I suspect 29 year old PDF files would still display fine.
In 2005 you could look back 15 years and see a completely different computing landscape with different file formats. Look back 15 years today and not that much changed. Lots of exciting new competitors as always (webp, avif, zstd) but only time will tell whether they will earn a place among the others or go the way of JPEG2000 and RAR. But if you store something today in a format that's survived the last 25 years, you have good chances to still be able to open it in common software 50 years down the line.
This is too shortsighted by the archival standards. Even Word itself doesn't offer full compatibility. VB? 3rd party active components? Other Office software integration? It's a mess. HTML and other web formats are only readable by the virtue of being constantly evolved while keeping the backwards compatibility, which is nowhere near complete and is hardware-dependent (e.g. aspect ratios, colors, pixel densities). The standards will be pruned sooner or later, due to the tech debt or being sidestepped by something else. And I'm pretty sure there are plenty of obscure PDF features that will prevent many documents from being readable in mere half a century. I'm not even starting on the code and binaries. And cloud storage is simply extremely volatile by nature.
Even 50 years (laughable for a clay tablet) is still pretty darn long in the tech world. We'll still probably see the entire computing landscape, including the underlying hardware, changing fundamentally in 50 years.
Future-proofing anything is a completely different dimension. You have to provide the independent way to bootstrap, without relying on the unbroken chain of software standards, business/legal entities, and the public demand in certain hardware platforms/architectures. This is unfeasible for the vast majority of knowledge/artifacts, so you also have to have a good mechanism to separate signal from noise and to transform volatile formats like JPEG or machine-executable code into more or less future proof representations, at least basic descriptions of what the notable thing did and what impact it had.
4 replies →
There is something called Lindy Effect, which states that a format's longevity is proportional to its current age.
I try to take advantage of this by only using older, open, and free things (or the most stable subsets of them) in my "stack".
For example, I stick to HTML that works across 20+ years of mainstream browsers.
While it's true that these standards are X years old, the software that encoded those formats yesteryear is very different from the software that decodes it today. It's a Ship of Theseus problem. They can claim an unbroken lineage since the distant future, the year 2000, but encoders and decoders had defects and opinions that were relied on--both intentionally and unintentionally--that are different from the defects and opinions of today.
I have JPEGs and MP3s from 20 years ago that don't open today.
4 replies →
Just going to mention Pro/E forward compatibility here: https://youtu.be/tY_Gy-EElc0
"The roots of Creo Parametric. Probably one of the last running installations of PTC's revolutionary Pro/ENGINEER Release 7 datecode 9135 installed from tape. Release 7 was released in 1991 and is - as all versions of Pro/ENGINEER - fully parametric. Files created with this version can still - directly - be opened and edited in Creo Parametric 5.0 (currently the latest version for production).
This is a raw video, no edits, and shows a bit of the original interface (menu manager, blue background, yellow and red datum planes, no modeltree).
Hardware used: Sun SparcStation 5 running SunOS 4.1.3 (not OpenWindows), 128MB RAM
Video created on january 6, 2019."
1 reply →
I can't help but think bad thoughts whenever I see another "static site maker" posted on here, or a brand new way of using JavaScript to render a web page.
Talk about taking the simplest and most durable of (web) formats and creating a hellscape of tangled complexity which becomes less and less likely to be maintainable or easy to archive the more layers of hipster js faddishness you add...
One of the claimed benefits of the JVM (and obviously later VMs) was that it would solve this issue: Java programs written in 2000 should still be able to run in 2100. And as far as I know the JVM has continued to fulfill this promise.
An honest question: If you are writing a program that you want to survive for 100+ years, shouldn't you specifically target a well-maintained and well-documented VM that has backward compatibility as a top priority? What other options are there?
People routinely boot DOS in e.g. qemu. The x86 ISA is 45 years old, older if you consider the 8008/8080 part of the lineage. It's not pretty, but it's probably the most widespread backwards compatible system out there.
1 reply →
While I love the JVM, and I also think it's one of the better runtimes in terms is backwards compatibility, there have been breakages. Most of the ones I've dealt with were easy to fix. But the ease of fixing is related to the access to source code. When something in a data stream is broken, be it an MP3 or a JPEG, I guess you almost inherently need special tooling to fix it (realistically). I imagine that with an SVG it'd be easier to hand-fix it.
> An honest question: If you are writing a program that you want to survive for 100+ years, shouldn't you specifically target a well-maintained and well-documented VM that has backward compatibility as a top priority? What other options are there?
I'd be tempted to target a well-known physical machine - build a bootable image of some sort as a unikernel - although in the age of VMWare etc. there's not a huge difference.
IMO the "right" way to do this would be to endow an institution to keep the program running, including keeping it updated to the "live" version of the language it's writen in, or even porting it between languages as and when that becomes necessary.
1 reply →
But he seems to have written this before virtual machines became widespread.
I think the concern is becoming increasingly irrelevant now, because if I really need to access a file I created in Word 4.0 for the Mac back in 1990, it's not too hard to fire up System 6 with that version of Word and read my file. In fact it's much easier now than it was in 2005 when he was writing. Sure it might take half an hour to get it all working, but that's really not too bad.
Most of this is probably technically illegal and will sometimes even have to rely on cracked versions, but also nobody cares and. All the OS's and programs are still around and easy to find on the internet.
Not to mention that while file formats changed all the time early on, these days they're remarkably long-lived -- used for decades, not years.
The outdated hardware concern was more of a concern (as the original post illustrates), but so much of everything important we create today is in the cloud. It's ultimately being saved in redundant copies on something like S3 or Dropbox or Drive or similar, that are kept up to date. As older hardware dies, the bits are moved to newer hardware without the user even knowing.
So the problem Kurzweil talked about has basically become less of an issue as time has marched on, not more. Which is kind of nice!
>I think the concern is becoming increasingly irrelevant now, because if I really need to access a file I created in Word 4.0 for the Mac back in 1990, it's not too hard to fire up System 6 with that version of Word and read my file. In fact it's much easier now than it was in 2005 when he was writing. Sure it might take half an hour to get it all working, but that's really not too bad.
And that was easy years ago.
Now you can WASM it and run it in a browser
1 reply →
> I think the concern is becoming increasingly irrelevant
I fear we may be on top of that point. With the "cloudification" where more and more software is run on servers one doesn't control there is no way to run that software in a VM as you don't have access to the software anymore. And even getting the pure data for a custom backup becomes harder and harder.
I'm certain that 100 years from now, when the collapse really gets rolling, we'll still have cuneiform clay tablets complaining about Ea-Nassir's shitty copper but most of the digital information and culture we've created and tried to archive will be lost forever. Eventually, we're going to lose the infrastructure and knowledge base we need to keep updating everything, people will be too busy just trying to find food and fighting off mutants from the badlands to care.
Well, almost all early tablets are destroyed or otherwise lost now. Do you think we will lose virtually all digital age information within a century? Maybe from a massive CME, I suppose.
3 replies →
I was able to backup/restore an old COBOL system via cpio between modern GNU cpio (man page last updated June 2018), and SCO's cpio (c. 1989). This is neither to affirm nor contradict Kurzweil, but rather to praise the GNU userland for its solid legacy support.
Interview? https://www.computerworld.com/article/2477417/the-kurzweil-i...
This is very very true. I have archived a number of books and magazines that were scanned and converted into "simplified" PDF, and archived on a DVD disks with C source code.
There are external dependencies but one hopes that the descriptions are sufficient to figure out how to make those work.
Actually I'd argue it's wrong precisely because we do manage to retrieve even such old artifacts. Only problem is that nobody cared for 30 years so the process was harder than it should be but in the end it was possible.
Sure, there is a risk that at some point, for example, any version of every PNG or H.264 decoder gets lost and so re-creating decoder for that would be significantly more complicated, but chances for that are pretty slim, but looking at `ffmpeg -codecs` I'm not really worried for that to ever happen.
> Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.
Maybe it isn't crude after all if it wins.
I do not consider microfiche or film crude at all.
They are just simple.
And what they do is very fully exploit the analog physics to yield high data density mere mortals can make effective use of.
And they make sense.
In my life, text, bitmaps and perhaps raw audio endure. Writing software to make use of this data is not difficult.
A quick scan later, microfiche type data ends up a bitmap.
Prior to computing, audio tape, pictures on film and ordinary paper, bonus points for things like vellum, had similar endurance and utility.
My own archives are photos, papers and film.
Modern backup would simply state “API keys and settings are here:”, and a link to collaboration platform closed after 3 years of existence.
Hey, it's the cloud. Backups are "someone else's problem". That is until they are your problem, then you're up a creek.
> Hey, it's the cloud. Backups are "someone else's problem". That is until they are your problem, then you're up a creek.
The FSF used to sell these wonderful stickers that said "There is not cloud. It's just someone else's computer."
1 reply →
Honestly backup space is weirdly sparse for anything on enterprise scale.
For anything more than few machines there is bacula/bareos (that pretends everything is tape with mostly miserable results), backuppc (that pretends tapes are not a thing, with miserable results), and that's about it, everything else seems to be point-to-point backups only with no real central management.
Are talking about open source only? Because there are loads of options available. Veritas has two products (netbackup and backupexec). There is also commvault, veeam, ibm spectrum protect and hp data protector. Admittedly only netbackup and commvault are what I would truly call enterprise, but your options are certainly not limited.
1 reply →
You can add amanda to the "pretends everything is tape with mostly miserable results" list.
Absolutely amazing story. Fantastic!
I've actually long been stunned by the propensity of proprietary backup software to use undocumented, proprietary formats. I've always found this quite stunning, in fact. It seems to me like the first thing one should make sure to solve when designing a backup format is to ensure it can be read in the future even if all copies of the backup software are lost.
I may be wrong but I think some open source tape backup software (Amanda, I think?) does the right thing and actually starts its backup format with emergency restoration instructions in ASCII. I really like this kind of "Dear future civilization, if you are reading this..." approach.
Frankly nobody should agree to use a backup system which generates output in a proprietary and undocumented format, but also I want a pony...
It's interesting to note that the suitability of file formats for archiving is also a specialised field of consideration. I recall some article by someone investigating this very issue who argued formats like .xz or similar weren't very suited to archiving. Relevant concerns include, how screwed you are if the archive is partly corrupted, for example. The more sophisticated your compression algorithm (and thus the more state it records from longer before a given block), the more a single bit flip can result in massive amounts of run-on data corruption, so better compression essentially makes things worse if you assume some amount of data might be damaged. You also have the option of adding parity data to allow for some recovery from damage, of course. Though as this article shows, it seems like all of this is nothing compared to the challenge of ensuring you'll even be able to read the media at all in the future.
At some point the design lifespan of the proprietary ASICs in these tape drives will presumably just expire(?). I don't know what will happen then. Maybe people will start using advanced FPGAs to reverse engineer the tape format and read the signals off, but the amount of effort to do that would be astronomical, far more even than the amazing effort the author here went to.
To add, thinking a bit more about it: Designing formats to be understandable by future civilizations actually reduces to a surprising degree to the same set of problems which METI has to face. As in, sending signals designed to be intelligible to extraterrestrials - Carl Sagan's Contact, etc.
Even if you write an ASCII message directly to a tape, that data is obviously going to be encoded before being written to the tape, and you have no idea if anyone will be able to figure out that encoding in future. Trouble.
What makes this particularly pernicious is the fact that LTO nowadays is a proprietary format(!!). I believe the spec for the first generation or two of LTO might be available, but last I checked, it's been proprietary for some time. The spec is only available to the (very small) consortium of companies which make the drives and media. And the number of companies which make the drives is now... two, I think? (They're often rebadged.) Wouldn't surprise me to see it drop to one in the future.
This seems to make LTO a very untrustworthy format for archiving, which is deeply unfortunate.
The best format for archiving is many formats.
Make an LTO tape... But also make a Bluray... And also store it on some hard drives... And also upload it to a web archive...
The same for the actual file format... Upload PDF's... But also upload word documents.. And also ASCII...
And same for the location... Try to get diversity of continents... Diversity of geopolitics (ie. some in USA, some in Russia). Diversity of custodians (friends, businesses, charities).
Even ASCII itself is a strange encoding that could be lost with enough time and need to be recovered through cryptographic analysis and signals processing. That doesn't look at all likely today given UTF-8's promised and mostly accomplished ubiquity and its permanent grandfathering of ASCII. But ASCII is still only one of a number of potential encoding schemes, isn't necessarily obvious from first principles.
Past generations thought EBCDIC would last longer than it did.
Again, not that there any indications now that ASCII won't survive nearly as long as the English language does at this point, just that when we're talking about sending signals to the future, even assuming ASCII encoding is an assumption to question.
1 reply →
These things make more sense because LTO is used for backup, not archival. Companies don't want to be able to read the tape data in 50 years, they want to be able to read it tomorrow, after the entire business campus burns down.
You mean the "if you are reading this in the distant future" instructions are written to the medium first? And are straight up ASCII?
Nice. That kind of thing makes too much sense. Wow. Such cheap insurance. Nice work from that team.
Yeah. If I ever wrote a backup system I'd do this too, write the whole spec for the format first to every medium. A 100k specification describing the format is nothing to waste on a medium which can store 10TB.
1 reply →
It's kinda strange that we still don't have a technology that would allow one to scan a magnetic medium at high resolution and then process it in software. This would be nice for all kinds of things that use magnetic tapes and platters — data recovery, perfect analog tape digitization, etc. The closest I've seen to it is that project that captures the raw signal from the video head of a VCR and then decodes it into a picture.
Isn't there a subset of that at least for floppy discs with Kryoflux or GreaseWeazle style controllers? They read the raw flux transitions off the drive head, and then it's up to software to figure out that it's a Commodore GCR disc or a Kaypro MFM one.
LTO tape media itself is typically only rated at 30 years, so I suspect the tapes will die before the drives do.
I've always admired the tenacity of people who reverse engineer stuff. To be able to spend multiple months figuring out barely documented technologies with no promise of success takes a lot a willpower and discipline. It's something I wish I could improve more in myself.
I think you could. In some sense "easily". It may be about finding that thing you're naturally so interested in or otherwise drawn to, that the months figuring out become a type of driven joy, and so the willpower kinda automatic.
And if you find it, don't judge what it is or worry what others might think - or even necessarily tell anyone. Sometimes the most motivating things are highly personal, as with the OP; a significant part of their childhood.
You definitely have a point there, looking at some of my previous work I was able to stick to projects for many months if I found the work interesting. I'll have to admit in the past 5 or so years any time I've tried to start a project there was always the thought in the back of my mind of 'will this benefit my career' or 'how can I make money on this in the future'. It seems having such thoughts adds additional anxiety to whenever I try and start to work on something for fun.
Looks like that is what I need to start looking for again, projects which I find interesting or fun to do in my spare time, without thinking about how it would affect my career or trying to find ways to monetize it.
1 reply →
Fascinating read that unlocked some childhood memories.
I'm secondhand pissed at the recovery company, I have a couple of ancient SD cards laying around and this just reinforces my fear that if I send them away for recovery they'll be destroyed (the cards aren't recognized/readable by the readers built into MacBooks, at least)
My understanding is that flash memory does not do very well at all for long term unpowered data retention. flash memory is basically a capacitor(it is not really a capacitor but close to one) and will loose it's charge after a few years.
And magnetic drives will seize up. and optical disks get oxidized, and tapes stick together. long term archiving is a tricky endeavor.
It is however an interesting challenge. I think I would get acquainted with the low level protocol used by sd cards. then modify a microcontroller sdmmc driver to get me an image of the card(errors and all). that is, without all the scsi disk layers doing their best to give you normalized access to the device. Or more realistically, hope someone more talented than me does the above.
Tapes hold up really well if they're not in absolutely awful storage conditions. And the claim at least was that the early CD-ROMs were quite durable, being a straight up laser carved diffraction grating. CDRs on the other hand rely on dye which will degrade rapidly.
3 replies →
> My understanding is that flash memory does not do very well at all for long term unpowered data retention
You need to let flash cells rest before writing again if you want to achieve long retention periods, see section 4 in [1]. The same document says 100 years is expected if you only cycle it once a day, 10k times over 20 years (Fig 8).
[1]: https://www.infineon.com/dgdl/Infineon-AN217979_Endurance_an...
Last year I helped a friend recover photos from a portable WD HDD. It was formatted in FAT32 and I was forced to run R-Studio to get reliable results. There was a lot of damaged (readable, with artifacts) and corrupted (doesn't render, have wrong size) files.
Painful lesson I've learned myself the hard way - don't rush something that doesn't need to be rushed.
This is giving me some anxiety about my tape backups.
I have backed up my blu-ray collection to a dozen or so LTO-6 tapes, and it's worked great, but I have no idea how long the drives are going to last for, and how easy it will be to repair them either.
Granted, the LTO format is probably one of the more popular formats, but articles like this still keep me up at night.
The only surefire method to keep the bits readable is to continue moving them onto new media every few years. Data has a built-in recurring cost. I'd love to see a solution to that problem but I think it's unlikely. It's a least possible, though, that we'll come up with a storage medium with sufficient density and durability that'll it'll be good enough.
I don't even want to think about the hairy issues associated with keeping the bits able to be interpreted. That's a human behavior problem more than a technology problem.
LTO is one of the best choices for compatibility. I remember just how awful DDS (same sort of media as DAT) tape backups were - due to differences in head alignments, it was a real lottery as to whether any given tape could be read on a different drive than the one that wrote it.
Do test restores. LTO is very good but without verification some will fail at some point.
But your original bluray disk are also a backup.
LTO-7 drives read LTO-6, and will be available for quite a while.
In 2016 I've used an LTO-3 drive to restore a bunch (150 or 200) of LTO-1/2 tapes from 2000-2003, and almost all but one or two worked fine.
I really wish they would name the data recovery company so that I can never darken their door with my business.
https://news.ycombinator.com/item?id=36062785 had been edited to censor the information, so I'm dupicating it here). Caveat that I don't know if that's actually correct, since efforts to suppress it are only circusantial evidence in favor.
> Over the span of about a month, I received very infrequent and vague communications from the company despite me providing extremely detailed technical information and questions.
Ahh the business model of "just tell them to send us the tape and we'll buy the drive on eBay"
To be honest as long as they are very careful about not doing any damage to the original media then it might work and be a win-win for both sides in a "no fix no fee" model where the customer only pays if the data is successfully recovered.
Their cardinal sin was that they irreparably damaged the tape without prior customer approval.
3 replies →
It’s not too hard to find with the following search, “we can recover data from tape formats including onstream”
The OP explicity didn't name them (despite many people recommending to, even preservationists in this field on Reddit and Discord) but it's easy to find just by googling the text on the screenshots
3 replies →
>> The tape was the only backup for those things, and it completes Frogger 2's development archives, which will be released publicly.
In cases like this can imagine some company yelling "copyright infringement" even though they don't possess a copy themselves. It's a really odd situation.
As a kid, I got this game as a gift and really, really wanted to play it. But after beating the second level, the game would always crash on my computer with an Illegal Operation exception. I remember sending a crash report to the developer, and even updating the computer, but I never got it working.
I adored this game as a kid, and I think I do have a faint memory of some stability issues, but I believe I was able to beat the game.
I work in the tape restoration space. My biggest piece of advice is never NEVER encrypt your tapes. If you think restoring data from an unknown format tape is hard, trying to do it when the drive will not let you read the blocks off the tape without a long lost decryption key is impossible.
TIL there are three completely different games named "Frogger 2" I assumed this was for the 1984 game, but this is for the 2000 game (there is also a 2008 game).
Thanks for that, it seems like a surprisingly modern format for such an old game.
Links for the games referenced:
- Frogger II: ThreeeDeep! (1984)
https://www.mobygames.com/game/7265/frogger-ii-threeedeep/
- Frogger 2: Swampy's Revenge (2000) [1]
https://www.mobygames.com/game/2492/frogger-2-swampys-reveng...
- Frogger 2 (2008) [2]
https://www.mobygames.com/game/47641/frogger-2/
> the ADR-50e drive was advertised as compatible, but there was a cave-at
I'm assuming the use of "cave-at" means the author has inferred an etymology of "caveat" being made up of "cave" and "at", as in: this guarantee has a limit beyond which we cannot keep our promises, if we ever find ourselves AT that point then we're going to CAVE. (As in cave in, meaning give up.) I can't think of any other explanation of the odd punctuation. Really quite charming, I'm sure I've made similar inferences in the past and ended up spelling or pronouncing a word completely wrong until I found out where it really comes from. There's an introverted cosiness to this kind of usage, like someone who has gained a whole load of knowledge and vocabulary from quietly reading books without having someone else around to speak things out loud.
Dang it. OP here, I saw this typo and swear I fixed this typo before posting it!!
I thought it might have been a transcription error of “carve out,” but your theory is more logical.
Truly noble effort. Hopefully the writeup and the tools will save others much heartbreak.
Wow, that backup software sounds like garbage. Why not just use tar? Why would anyone reinvent that wheel?
The world of tape backup was (is?) absolutely filled with all sorts of vendor-lock in projects and tools. It's a complete mess.
And even various versions of tar aren't compatible, and that's not even starting with star and friends.
It's not just limited to tape, most archiving and backup software is proprietary. It's impossible to open Acronis or Macrium Reflect images without their Windows software. In Acronis's case they even make it impossible to use offline or on a server OS without paying for a license. NTBackup is awfully slow and doesn't work past Vista, and it's not even part of XP POSReady for whatever reason, so I had to rip the exe from a XP ISO and unpack it (NTBACKUP._EX... I forgot microsoft's term for that) because the Vista version available on Microsoft's site specifically checks for longhorn or vista.
Then there's slightly more obscure formats that didn't take off in the western world, and the physical mediums too. Not many people had the pleasure of having to extract hundreds of "GCA" files off of MO disks using obscure Japanese freeware from 2002. The English version of the software even has a bunch of flags on virustotal that the standard one doesn't. And there's obscure LZH compression algorithms that no tool available now can handle.
I've found myself setting up one-time Windows 2000/XP VMs just to access backups made after 2000.
1 reply →
I have at various times considered a tape backup solution for my home, but always give up when it seems every tape vendor is only interested in business clients. It was a race to stay ahead of hard drives and oftentimes they seemed to be losing. The price points were clearly aimed at business customers, especially on the larger capacity tapes. In the end I do backup to hard drives instead because it's much cheaper and faster.
6 replies →
ARCServe was a Computer Associates product. That's all you need to know.
It had a great reputation on Novell Netware but the Windows product was a mess. I never had a piece of backup management software cause blue screens (e.g. kernel panics) before an unfortunate Customer introduced me to ARCServe on Windows.
My favorite ArcServe bug which they released a patch for (and which didn’t actually fix the issue, as I recall) had a KB article called something along the lines of “The Open Database Backup Agent for Lotus Notes Cannot Backup Open Databases”.
IIRC tar has some Unixisms that don't necessarily work for Windows/NTFS. Not saying reinventing tar is appropriate but there's Windows/NTFS that a Windows based tape backup need to support.
Most of what makes NTFS different than FAT probably doesn't need to be backed up. Complex ACLs, alternative data streams, shadow copies, etc, are largely irrelevant when it comes to making a backup. Just a simple warning "The data being backed up includes alternative data streams. These aren't supported and won't be included in the backup" would suffice.
8 replies →
That’s fair; I wasn’t really considering windows. It seems like there ought to be some equivalent by now though.
The format is extensible enough that it could be added
The company that made it probably was hoping for vendor lock-in
Vendor lock in for backup and archival products is so ridiculous. It increases R&D to ensure the lock-in, and the company won't exist by the time the lock-in takes effect.
1 reply →
Is there way to read magnetic tapes like these in such a way as to get the raw magnetic flux at high resolution?
It seems like it would be easier to process old magnetic tapes by imaging them and then applying signal processing rather than finding working tape drives with functioning rollers. Most of the time, you're not worried about tape speed since you're just doing recovery read rather than read/write operations. So, a slow but accurate operation seems like it would be a boon for these kinds of things.
For anybody who is into this this is a a good excuse to share a presentation from Vintage Computer Fest West 2020 re: magnetic tape restoration: https://www.youtube.com/watch?v=sKvwjYwvN2U
The presentation explores using software-defined signal processing analyze a digitized version of the analog signal generated from the flux transitions. It's basically moving the digital portion of the tape drive into software (a lot like software-defined radio). This is also very similar to efforts in floppy disk preservation. Floppies are amazingly like tape drives, just with tiny circular tapes.
OP here! Yes I'd highly recommend this video, I stumbled across it early on when trying to familiarize myself with what the options were-- and it's a good video!
At the very least, and the cost for this perhaps would be prohibitive, but some mechanism to duplicate the raw flux off the tape onto another tape in an identical format, a backup of the backup. This would allow for attempts to read the data that may be potentially destructive to the media (for example, breaking the tape accidentally) and not lose the original signal.
Sounds like at least in this case that ASIC in the drive was doing some (non trivial) signal processing. Would be interesting to know how hard it would be to get from the flux pattern back to zeros and ones. I guess with a working drive you can at least write as many test patterns as you want until you maybe figure it out.
At the very least the drive needs to be able to lock onto the signal. It's probably encoded in a helix on the drive and if the head isn't synchronized properly you won't get anything useful, even with a high sampling rate.
3 replies →
You still need to know where to look, the format, and using specialized equipment which cost wasn't driven down by mass manufacturing, so, in theory yes, in practice not.
(Completely guessing here with absolute no knowledge of the real state of things)
Yes. There’s some guy on YouTube who does stuff like that (he reverse engineered the audio recordings from a 747 tape array) but it can be quite complicated.
Would you have a link by any chance? Thanks!
1 reply →
F2 was a really neat game. It almost invented Crypt of the Necrodancer’s genre decades early.
It’s a little sad that it took such a monumental effort to bring the source code back from the brink of loss. It’s times like that that should inspire lawmakers to void copyright in the case that the copyright holders can’t produce the thing they’re claiming copyright over.
Heh, I remember playing .mp3 files directly from QIC-80 tapes, somewhere around 1996. One tape could store about 120 MB, which is equal to about two compact discs' worth of audio. The noise of the tape drive was slightly annoying, though. And it made me appreciate what the 't' in 'tar' stands for.
Did you mean 1200 MB? That would make sense wrt. 2x CD capacity.
No, it was really only 120 MB. I was referring to the length of an audio compact disc, not the capacity of a CD-ROM. At 128 kbps, you'd get about 2 hours of play time.
Of course it didn't really make sense to use digital tapes for that use case, even back then. It was just for fun, and the article sparked some nostalgic joy, which felt worth sharing :)
They reference MP3, and a CD ripped down to MP3 probably fits in the 50-100MB envelope for size. It has been a very long time since I last ripped an album, but that size jives with my memory.
This is just random, but reading this and the backup discussion made me think about SGI IRIX and how it could do incremental backups.
One option was to specify a set of files, and that spec could just be a directory. Once done, the system built a mini filesystem and would write that to tape.
XFS was the filesystem in use at the time I was doing systems level archival.
On restores, each tape, each record was a complete filesystem.
One could do it in place and literally see the whole filesystem build up and change as each record was added. Or, restore to an empty directory and you get whatever was in that record.
That decision was not as information dense as others could be, but it was nice and as easy as it was robust.
What our team did to back up some data managed engineering software was perform a full system backup every week, maybe two. Then incrementals every day, written twice to the tape.
Over time, full backups were made and sent off site. One made on a fresh tape, another made on a tape that needed to be cycled out of the system before it aged out. New, fresh tapes entered the cycle every time one aged out.
Restores were done to temp storage and rather than try and get a specific file, it was almost always easier to just restore the whole filesystem and then copy the desired file from there into its home location. The incrementals were not huge usually. Once in a while they got really big due to some maintenance type operation touching a ton of files.
The nifty thing was no real need for a catalog. All one needed was the date to know which tapes were needed.
Given the date, grab the tapes, run a script and go get coffee and then talk to the user needing data recovery to better understand what might be needed. Most of the time the tapes were read and the partial filesystem was sitting there ready to go right about the time those processes completed.
Having each archive, even if it were a single file, contain a filesystem data set was really easy to use and manage. Loved it.
A few months ago I was looking for an external backup drive and thought that SSD would be great because it's fast and shock resistant. Years ago I killed a Macbook Pro HD by throwing it on my bed from few inches high. Then I read a comment on Amazon about SSD losing information when unpowered for a long time. I couldn't find any quick confirmation in the product page, took me a few hours of research to find some paper about this phenomenon. If I remember correctly it takes a few weeks for the stored SSD to start losing its data. So I bought a mechanical HD.
Another tech tip is not buying 2 backup devices from the same batch or even the same model. Chances being these will fail in the same way.
To the last bit, I've seen this first hand. Had a whole RAID array of the infamous IBM DeathStar drives fail one after the other while we frantically copied data off.
Last time I ever had the same model drives in an array.
Heh, I remember in the early 1990s having a RAID array with a bunch of 4Gb IBM drives come up dead after a weekend powerdown for a physical move due to "stiction". I was on the phone with IBM, and they were telling me to physically bang the drives on the edge of desk to loosen them up. Didn't seem to be working, so their advice was "hit it harder!" When I protested, they said, "hey, it already doesn't work, what have you got to lose?" So I hit it harder. Eventually got enough drives to start up to get the array on line, and you better believe the first thing I did after that was create a fresh backup (not that we didn't have a recent backup anyway), and the 2nd thing I did was replace those drives, iirc, with Seagate Barracudas.
1 reply →
When I was still relatively familiar with flash memory technologies (in particular NAND flash, the type used in SSDs and USB drives), the retention specs were something like 10 years at 20C after 100K cycles for SLC, and 5 years at 20C after 5-10K cycles for MLC. The more flash is worn, the leakier it becomes. I believe the "few weeks" number for modern TLC/QLC flash, but I suspect that is still after the specified endurance has been reached. In theory, if you only write to the flash once, then the retention should still be many decades.
Someone is trying to find out with an experiment, however: https://news.ycombinator.com/item?id=35382252
Indeed. The paper everyone gets the "flash loses its data in a few years" claim from wasn't dealing with consumer flash and consumer use patterns. Remember that having the drive powered up wouldn't stop that kind of degradation without explicitly reading and re-writing the data. Surely you have a file on an SSD somewhere that hasn't been re-written in several years, go check yourself whether it's still good.
Even the utter trash that is thumb drives and SD cards seem to hold data just fine for many years in actual use.
IIRC, the paper was explicitly about heavily used and abused storage.
CD-R drives were already common in 2001: https://en.wikipedia.org/wiki/CD-R
I wonder would a CD-R disk retain data for these 22 years?
Only if you kept the disk in a refrigerator. Bits are stored by melting the plastic slightly and the dye seeping in. Over time, the warmth of "room temperature" will cause the pits to become less well-defined so the decoder has to spend more time calculating "well, is that really a 1 or is it a sloppy 0". There's a lot of error detection/correction built into the CD specs, but eventually, there will be more error than can be corrected for. If you've ever heard the term "annealing" when used in machine learning, this is equivalent.
Living in South Florida, ambient temperatures were enough to erase CD-Rs - typically in less than a year. I quickly started buying the much more expensive "archival" discs, but that wasn't enough. One fascinating "garage band" sold their music on CD-Rs and all of my discs died (it was a surfer band from Alabama).
The recording is made in the dye layer, a chemical change, and the dye degrades (particularly in sunlight) so the discs have a limited shelf life. Checking Wikipedia, it appears azo dye formulations can be good for tens of years.
Melting polycarbonate would call for an absurdly powerful laser, a glacial pace, or both, and you wouldn't have to use dye at all. I'd guess such a scheme would be extremely durable, though.
I tried a couple of CD-Rs that were stored in a dry closed drawer for most of the last 20 years recently, and they seemed to at least initially come up. Now I can reinstall Windows 2000 with Service Pack 4 slipstreamed!
On the topic of Froggers, I enjoyed https://www.youtube.com/watch?v=FCnjMWhCOcA
This brings back (unpleasant) memories. I remember trying to get those tape drives working with FreeBSD back in 1999, and it going nowhere.
This will be fun in 20 years, trying recover 'cloud' backups from servers found in some warehouse.
Nah it will be very simple:
....What do you mean "nobody paid for the bucket for last 5 years" ?
There is some chance someone might stash old hard drive or tape with backup somewhere in the closet. There is no chance there will be anything left when someone stops paying for cloud.
Those drives will all be encrypted and most likely shredded.
I'm pretty sure that even with the substantial damage done by the recovery company, a professional team like Kroll Ontrack can still recover the complete tape data, although it probably won't be cheap.
As the other comment here says, any company claiming to do data recovery, and damaging the original media to that extent, should be named and shamed. I can believe that DR companies have generic drives and heads to read tapes of any format they come across, but even if they couldn't figure out how the data was encoded, there was absolutely no need to cut and splice the tape. I suspect they did that just out of anger at not likely being able to recover anything (and thus having spent a bunch of time for no profit.)
Melted pinch rollers are not uncommon and there are plenty of other (mostly audio) equipment with similar problems and solutions --- dimensions are not absolutely critical and suitable replacements/substitutes are available.
As an aside, I think that prominent "50 Gigabytes" capacity on the tape cartridge, with a small asterisk-note at the bottom saying "Assumes 2:1 compression", should be outlawed as a deceptive marketing practice. It's a good thing HDD and other storage media didn't go down that route.
Name and shame the company, you had a personal experience, you have proof. Name and shame. It helps nobody if you don't publicize it. Let them defend it, let them say whatever excuse, but your review will stand.
I don't want to even remotely tempt them to sue. They have no grounds, but I'm not taking risks-- companies are notorious for suing when they know they'll lose. Others who have posted it here have identified the right company though.
This is a masterful recovery effort. The README should be shared as an object lesson far and wide to every data restoration and archival service around.
I’ve been suffering through something similar with a DLT IV tape from 1999. Luckily I didn’t send out to the data recovery company. But still unsuccessful.
Is anyone else calling it “froggering/to frogger” if they have to cross a bigger street by foot without a dedicated crossing?
DVDs should not be overlooked for backup. The Millennium type have been simulated to withstand 1,000 years.
The author has fantastic endurance, what a marathon to get the files of the tape.
Someone was wise enough to erase the evidence in Party.
Nice catch! I think it was a little less juvenile than it might sound. I believe this was for a different game, Fusion Frenzy, which was a party minigame collection.
While I didn't understand the parent you are replying to (not your answer), your mention of Fusion Frenzy caught my eye. I've had a soft spot for that game since spending hours playing the "xbox magazine" demo with a childhood friend. Could you clarify? Is there any history gem about that one? I'd dig a PC port!
2 replies →
Yet, despite ARCserve showing a popup which says "Restoration Successful", it restores up to the first 32KB of every file on the tape, but NO MORE."
From 10,000 feet, this sounds suspiciously like ARCserve is reading a single tape block or transfer buffer's worth of data for each file, writing out the result, then failing and proceeding to the next file.
Success popup notwithstanding, I'd expect to find errors in either the ARCserve or Windows event logs in this case — were there none?
While it's been decades since I've dealt with ARCserve specifically, I've seen similar behavior caused by any number of things. Off the top of my head,
(1) Incompatibilities between OS / backup software / HBA driver / tape driver.
In particular, if you're using a version of Windows much newer than Windows 2000, try a newer version of ARCserve.
In the absence of specific guidance, I'd probably start with the second* ARCserve version that officially supports Windows Server 2003:
(a) Server 2003 made changes to the SCSI driver architecture that may not be 100% compatible with older software.
(b) The second release will likely fix any serious Server 2003 feature-related bugs the first compatible version may have shipped with, without needing to install post-release patches that may be hard to find today.
(b) Significantly newer ARCserve versions are more likely to introduce tape drive / tape format incompatibilities of their own.
(2) Backup software or HBA driver settings incompatible with the hardware configuration (e.g., if ARCserve allows it, try reducing the tape drive transfer buffer size or switching from fixed block (= multiple tape blocks per transfer) to variable block (= single tape block per transfer) mode; if using an Adaptec HBA, try increasing the value of /MAXIMUMSGLIST[1]).
(3) Shitty modern HBA driver support for tape (and, more generally, non-disk) devices.
For example, modern Adaptec Windows HBA drivers have trouble with large tape block sizes that AFAIK cannot be resolved with configuration changes (though 32 kB blocks, as likely seen here, should be fine).
In my experience with PCIe SCSI HBAs, LSI adapters are more likely to work with arbitrary non-disk devices and software out-of-the-box, whereas Adaptec HBAs often require registry tweaks for "unusual" circumstances (large transfer sizes; concurrent I/O to >>2 tape devices; using passthrough to support devices that lack Windows drivers, especially older, pre-SCSI 2 devices), assuming they can be made to work at all.
LSI20320IE PCIe adapters are readily available for $50 or less on eBay and, in my experience, work well for most "legacy" applications.
(To be fair to Adaptec, I've had nothing but good experiences using their adapters for "typical" applications: arbitrary disk I/O, tape backup to popular drive types, CD/DVD-R applications not involving concurrent I/O to many targets, etc.)
(4) Misconfigured or otherwise flaky SCSI bus.
In particular, if you're connecting a tape drive with a narrow (50-pin) SCSI interface to a wide (68-pin) port on the HBA, make sure the entire bus, including the unused pins, are properly terminated.
The easiest way to ensure this is to use a standard 68-pin Ultra320 cable with built-in active LVD/SE termination, make sure termination is enabled on the HBA, disabled on the drive, that the opposite end of the cable from the built-in terminator is connected to the HBA, and, ideally, that the 68-to-50-pin adapter you're using to connect the drive to the cable is unterminated.
You can also use a 50-pin cable connected to the HBA through a 68-to-50-pin adapter, but then you're either relying on the drive properly terminating the bus — which it may or may not do — or else you need an additional (50-pin) terminator for the drive end, which will probably cost as much as a Ultra320 cable with built-in termination (because the latter is a bog-standard part that was commonly bundled with both systems and retail HBA kits).
Note that I have seen cases where an incorrect SCSI cable configuration works fine in one application, but fails spectacularly in another, seemingly similar application, or even the same application if the HBA manages to negotiate a faster transfer mode. While this should be far less likely to occur with a modern Ultra160 or Ultra320 HBA, assume nothing until you're certain the bus configuration is to spec (and if you're using an Ultra2 or lower HBA, consider replacing it).
With all that said, reversing the tape format may well be easier than finding a compatible OS / ARCserve / driver / HBA combination.
In any case, good job with that, and thanks for publishing source code!
[1] http://download.adaptec.com/pdfs/readme/relnotes_29320lpe.pd...
On a related note, I own a few older tape drives[1], have access to many more[2], and would be happy to volunteer my time and equipment to small-scale hobbyist / retrocomputing projects such as this — tape format conversions were a considerable part of my day job for several years, and tape drives are now a minor hobby.
See my profile for contact information.
[1] 9-track reel, IBM 3570, IBM 3590, early IBM 3592, early LTO, DLT ranging from TK-50 to DLT8000.
[2] IBM 3480/3490/3490E, most 4mm and 8mm formats, most full-sized QIC formats including HP 9144/9145, several QIC MC/Travan drives with floppy controllers of some description, a Benchmark DLT1 assuming it still works, probably a few others I'm forgetting about.
At some point, I feel as if it may be easier just to rewrite the code from the ground up vs. going through all that computational archaeology....
Or in a few years, just have an AI write the code...
> This is where the story should probably have stopped. Given up and called it a day, right? Maybe, but I care about this data, and I happen to know a thing or two about computers.
Hahaha awwwww yeah :muscle: