← Back to context

Comment by Twirrim

6 days ago

Years ago I worked for a company that provided managed hosting services. That included some level of alarm watching for customers.

We used to rotate the "person of contact" (POC) each shift, and they were responsible for reaching out to customers, and doing initial ticket triage.

One customer kept having a CPU usage alarm go off on their Windows instances not long after midnight. The overnight POC reached out to the customer to let them know that they had investigated and noticed that "system idle processes" were taking up 99% of CPU time and the customer should probably investigate, and then closed the ticket.

I saw the ticket within a minute or two of it reopening as the customer responded with a barely diplomatic message to the tune of "WTF". I picked up that ticket, and within 2 minutes had figured out the high CPU alarm was being caused by the backup service we provided, apologised to the customer and had that ticket closed... but not before someone not in the team saw the ticket and started sharing it around.

I would love to say that particular support staff never lived that incident down, but sadly that particular incident was par for the course with them, and the team spent inordinate amount of time doing damage control with customers.

In the 90s I worked for a retail chain where the CIO proposed to spend millions to upgrade the point-of-sale hardware. The old hardware was only a year old, but the CPU was pegged at 100% on every device and scanning barcodes was very sluggish.

He justified the capex by saying if cashiers could scan products faster, customers would spend less time in line and sales would go up.

A little digging showed that the CIO wrote the point-of-sale software himself in an ancient version of Visual Basic.

I didn't know VB, but it didn't take long to find the loops that do nothing except count to large numbers to soak up CPU cycles since VB didn't have a sleep() function.

  • That's hilarious. I had a similar situation, also back in the 90s, when a developer shipped some code that kept pegging the CPU on a production server. He insisted it was the server, and the company should spend $$$ on a new one to fix the problem. We went back-and-forth for a while: his code was crap versus the server hardware was inadequate, and I was losing the battle, because I was just a lowly sysadmin, while he was a great software engineer. Also, it was Java code, and back then, Java was kinda new, and everyone thought it could do no wrong. I wasn't a developer at all back then, but I decided to take a quick look at his code. It was basically this:

    1. take input from a web form

    2. do an expensive database lookup

    3. do an expensive network request, wait for response

    4. do another expensive network request, wait for response

    5. and, of course, another expensive network request, wait for response

    6. fuck it, another expensive network request, wait for response

    7. a couple more database lookups for customer data

    8. store the data in a table

    9. store the same data in another table. and, of course, another one.

    10. now, check to see if the form was submitted with valid data. if not, repeat all steps above to back-out the data from where it was written.

    11. finally, check to see if the customer is a valid/paying customer. if not, once again, repeat all the steps above to back-out the data.

    I looked at the logs, and something like 90% of the requests were invalid data from the web form or invalid/non-paying customers (this service was provided only to paying customers).

    I was so upset from this dude convincing management that my server was the problem that I sent an email to pretty much everyone that said, basically, "This code sucks. Here's the problem: check for invalid data/customers first.", and I included a snippet from the code. The dude replied-to-all immediately, claiming I didn't know anything about Java code, and I should stay in my lane. Well, throughout the day, other emails started to trickle in, saying, "Yeah, the code is the problem here. Please fit it ASAP." The dude was so upset that he just left, he went completely AWOL, he didn't show up to work for a week or so. We were all worried, like he jumped off a bridge or something. It turned into an HR incident. When he finally returned, he complained to HR that I stabbed him in the back, that he couldn't work with me because I was so rude. I didn't really care; I was a kid. Oh yeah, his nickname became AWOL Wang. LOL

    • Hehe, being a Java dev since the late 90’s meant seeing a lot of bad code. My favorite was when I was working for a large life insurance company.

      The company’s customer-facing website was servlet based. The main servlet was performing horribly, time outs, spinners, errors etc. Our team looked at the code and found that the original team implementing the logic had a problem they couldn’t figure out how to solve, so they decided to apply the big hammer: they synchronized the doService() method… oh dear…

      1 reply →

  • I am a little confused. He was intentionally sabotaging performance?

    • People write code that does sleep statements when waiting for something else to happen. It makes sense in some contexts. Think of it like async/await with an event loop. Except you are using the OS scheduler like your “event loop”. And you sleep instead of awaiting.

      Now, if your language lacks the sleep statement or some other way to yield execution, what should you do instead when your program has no work to do? Actually, I don’t know what the answer is.

      2 replies →