Comment by TacticalCoder
12 hours ago
The USB stick hints at a big problem in our trade though: how do you "reboot" your IT infrastructure if it literally burns to the ground? I'm not talking about Google-scale systems (which still couldn't restart from scratch IIUC but they're actually working on it?) but only about SMEs.
How does a medium-sized SME were all the payrolls depends on Sara and her USB stick do if, literally, their servers do catch fire.
You've got backups, then what? How automated is the reinstallation of your typical SME's infra?
The closest I saw to that scenario was some documentary where some little trading firm had just time to fetch the backup hard drives before leaving the building on fire after a plane crashed into it on 9/11. The CEO (I think it was the CEO) was explaining that had he not grabbed a HDD with the backups, the company was done (not that I advice onsite/offline backups on HDDs that you must not forget to grab when the shit hits the fan as a solution btw).
I understand the "just drink the cloud kool-aid" angle: but are SMEs typically doing that?
How many SMEs out there are depending on Sara's knowledge of the USB memory stick and how to use it?
I've definitely seen similar things. And I'm sure many of you did too.
Many houses of cards?
When I took charge of solving backups for the single important box with unique, irreplaceable data -- the accounting system -- at an SME a long time ago, I think I approached it with the right amount of correctness. Therein, losing a day or three of recent data would have been recoverable; losing all of it would have been catastrophic.
I devised a system to perform bare-metal backups onto an easily-swapped, external 2.5" hard drive, using Acronis. I provided a plurality of these hard drives, and they were to be rotated off-site. The system was tolerant of human error and would proceed with making valid, current backups even if the drives were rotated incorrectly, or if not rotated at all on any given day. The backup drives each had complete file history (yay shadow copies) from an ever-advancing date, so any given drive could be used as a time machine of varying resolution, and also as the single source from which to independently start fresh.
I'd watch the logs to see that it was done, and for the most part: Whoever was assigned to that role normally did it properly-enough.
I documented it and showed the other technical folks how it works.
Sometimes I'd wander back and make sure the backup drives weren't accumulating on-site (there should never be more than 2 on-site). I'd periodically test these backups by restoring them completely onto identical hardware, to make sure the system hadn't got crufted up somehow and that it still continued to perform its task of restoring a working system from zero.
It worked fine for years and years. We never had to use that backup, but I had every confidence that it would be useful if that ever became necessary.
Eventually, my role changed and those things rather officially became Not My Problem.
Later, they moved the accounting system from that lineage of stout Proliant boxes to a trash-tier small-form 1u Lenovo machine that someone found used, on eBay, for cheap.
Backups are handled by the clown, somehow. The last I heard anything about it, the person doing the talking was very pleased with the money they'd saved and that they'd no longer have to pay "extortion" to Acronis.
I have every expectation that nobody has ever restored these backups. They're probably relying on the sheer hope that they'll never have to restore them, much less from zero.
And I also hope they never have to restore them, lest they may find out exactly what that data is worth to them.
> How does a medium-sized SME were all the payrolls depends on Sara and her USB stick do if, literally, their servers do catch fire.
Like every job, we overestimate our importance.
What do they do? They pay everyone the same as last month as a temporary measure, ask you to talk to your manager if your pay should be more this month, warn everyone that they're going to recalculate the payroll and adjust any differences next month. Then they calculate everyone's pay from the inputs, which really isn't such a hard problem when the alternative is failure. Maybe they pay some fancy consultants or an SAAS provider for a few months. Maybe they have to cut a few corners. Maybe they even get fined by their state's DoL. Life goes on.
> How many SMEs out there are depending on Sara's knowledge of the USB memory stick and how to use it?
I think at least in part, that is the point: orgs are missing the part of the equation where the institutional and organizational knowledge is critical. Sure, the code to accomplish parts B and C can be re-duct-taped together in a month or so by off-shore, or maybe an agent... but part A, its plumbing, and why it does what it does the way it does it due to historical failures and the knowledge behind that is probably what keeps it going.
Those things are learned starting at the ground level by bumping into them in the trenches.
The company just shuts down and its customers switch to competitors. This is economically efficient. The redundancy of a company is another company. It's a bit like how we don't insist on every server running two CPUs in lockstep in case one fails, because we have more than one server to handle requests.
>The USB stick hints at a big problem in our trade though: how do you "reboot" your IT infrastructure if it literally burns to the ground? I'm not talking about Google-scale systems (which still couldn't restart from scratch IIUC but they're actually working on it?) but only about SMEs.
Maersk ground to a halt because it got done nearly 100% by cryptolocker. IIRC they went to hard copy records, called everyone, got all of IT together with some company credit cards to get new laptops and flash drives and shit and literally rebuilt their infra from scratch.
https://www.itnews.com.au/news/maersk-had-to-reinstall-all-i...
I read a better post mortem but thats the highlights.
>How many SMEs out there are depending on Sara's knowledge of the USB memory stick and how to use it?
Part of my day job is finding, documenting and remediating these sort of issues.
"The CEO Coded this application in VB5 15 years ago, the entire business relies on it, theres no source code, theres no binary backups and the one computer it runs on just had its PSU fail"
"Theres a cron somewhere that compresses, zips and transports the payroll database interstate, outside of our network, before our weekly pay run"
"Theres been no documentation of this environment for 20 years, most of the hardware is that old, and the team that developed it just sold all their shares and left"
This shit is my life lmao.
Theres obviously some bias, because the good companies aren't asking me to do it for them. But I make a decent living examining, documenting and remediating this shit.
How did you get into that line of work? Sounds really interesting.
Refusal to pick a silo, having a knack for troubleshooting, falling into consulting. It just sort of happened. Helps to be extremely jaded too. My kneekjerk disbelief that something is good, documented or even functional makes me well suited to taking over new clients and finding where all their bodies are buried.
One of my favorite jobs early in my career was working for a really shonky wireless isp. The majority of the network was built by sales people using terrible tools with no documentation. I actually cant overstate how bad they were originally, they had entire areas of network with no recorded network config or credentials. My daily workflow was getting a ticket from a customer I had never heard of > trying to figure out where they were and what services they had (2 of their 3 billing systems were offline, and I often had to grep out information from a sqldump to find this stuff) > performing a discovery, L2 upwards of their infrastructure > semi offensively trying to authenticate into their infrastructure > resolve and document so that other people can reliably service them. All while pretending this was absolutely normal to the customer. Turns out there were lots of ISPs in the same boat, and turns out there's lots of non isp businesses in the same boat.
> How does a medium-sized SME were all the payrolls depends on Sara and her USB stick do if, literally, their servers do catch fire.
The SpecOps guys have the following bit of wisdom on offer: "Two is one and one is none".
And a backup you haven't verified you can restore from isn't one.