Yesterday, we had a story about Microsoft's disk management service using lots of CPU time if the username contained "user". Microsoft's official reply was not to do that.
I once found a bug in Coyote Systems' load balancers where, if the USER-AGENT ended with "m", all packets were dropped. They use regular expressions for various rules, and I suspect someone typed "\m" where they meant "\n". Vendor denied problem, even after I submitted a test case which failed on their own web site's load balancer.
Many, many years ago, I found a bug in 4.3BSD which prevented TCP connections from establishing with certain other systems during odd numbered 4 hour periods. It took three days to find the bug in BSD's sequence number arithmetic. A combination of signed and unsigned casts was doing the wrong thing.
My favorite was a story from the 1980s of a program which would crash depending on the phase of the Moon!
Turned out to be because it was generating a date by calling a general purpose astronomical routine, then parsing the date out of that. The astronomical routine among other things included the phase of the moon, and during some phases you would overflow the buffer that was passed in.
Another classic was a tech support call from the 1990s where the person's computer rebooted every time they flushed the toilet. Turns out that the person was at the end of the electrical line..and on a septic system. Flush the toilet, the septic system came online, causing a power dip, and that was enough to reboot the computer. A UPS fixed that.
I heard a story about a terminal in a public terminal room that a user was able to consistently log in to if they were sitting down in a chair in front if the terminal, but never if they were standing up.
They thought it might be static electricity, or some mechanical problem, or "problem exists between keyboard and chair", but finally they noticed something else was amiss...
It turns out some joker had re-arranged the 1234567890 keys to be 0123456789, so when the user was standing up, they looked down at the keyboard and typed their password (which contained a digit, of course) by looking at the keys. But when they were sitting down, they touch typed without looking at the keys, and got their password correct!
A BeOS bug story similar to the phase of the Moon:
Two [BeOS] test engineers were in a crunch. The floppy drive they were currently
testing would work all day while they ran a variety of stress tests, but the
exact same tests would run for only eight hours at night.
After a few days of double-checking the hardware, the testing procedure, and the
recording devices, they decided to stay the night and watch what happened. For
eight hours they stared at the floppy drive and drank espresso. The long dark
night slowly turned into day and the sun shone in the window. The angled sunlight
triggered the write-protection mechanism, which caused a write failure. A new
casing was designed and the problem was solved.
> My favorite was a story from the 1980s of a program which would crash depending on the phase of the Moon!
Do you have a name for this program? It sounds a lot like an urban legend - when would you ever find it easier to parse a date out of an astronomical program than to use actual date handing capabilities from the system or a library?
If it was my software, I would not have accepted that as a proper fix though. For me, the real bug is to rely on the "file" command (a user diagnostic tool) to choose different code paths in a printing program.
Used to own a PCMCIA wi-fi adapter that predated finalization of the 802.11b standard and which could reliably be induced to blue screen if I caused an HTTP request to be sent over the interface containing a lowercase x.
I think these are re-purposings of jokes about Unix, which is justly infamous for having lists of bugs that would've been easier to fix than to reproduce, for canonizing its bugs as standards of behavior to work around, and for having startlingly lazy components to begin with (especially the POSIX string-handling libraries -- strcmp() can be written in minutes).
I have an Acer laptop (2008 model) which doesn't boot when pressing the power button (since a few years).
Interestingly after pressing the power button you've to close the lid and then gentally press the closed lid once/twice (top center of lid where back of the camera is positioned) and then it immediately boots :)
The power button is on the opposite side (at the bottom left of the lid) so it can't be getting a pressure when lid is pressed after closed.
These kinds of issues fascinate me, aside from how frustrating they can be. They highlight the immense complexity that underlies what I, a lowly web developer, do on a daily basis, and what effect some decisions can have. It definitely motivates me to be a lot more careful and thorough about the code I write, and that's probably a good thing considering that I'm a painfully self-aware javascript coder who eschews, among other things, writing tests for my code and who thoughtlessly uses modules that solve whatever problem I'm facing.
I could go on to write a love-letter to how a number of HN posts make me want to be a better programmer, but I'll keep it a simple as possible.
Being here, even now, and even though I do sort of feel like one of the 'older crew' suspiciously eyeing what appears to be an influx of 'others', to put it vaguely, ultimately what I love most about this place is reading about the 'old-timers' and how they worked, the 'oldest code that is still in use', the argument for and against LISPs, the problems of the JS/Node ecosystem (bit sick of that), the crazy shit some programmer created, and so on.
I'm pretty much addicted to HN, while I managed to cut out Facebook and Reddit. What keeps me here is the intense desire to be a hacker, and how, even now, HN fuels it. To not just make websites or work with the latest framework, but to geek out on things and become better at what I truly believe is a craft.
If at some point the noise overpowers the substance that I care for, I hereby request anyone who takes pity to let me know where to move on to. But so far, with dangs good moderation, I'm impressived with how well HN is holding out.
I had a bug with a Bank of America payment system which wouldn't accept registrations with upper-case Zs in the company name. I went into super sleuth mode and somehow found the company they hired to make it and got a number for their development department. After explaining the problem to the guy who answered his only response was, "How did you get this number??"
A few years ago I discovered that the wells-Fargo website would log you in by typing the correct password and some additional n characters after the password. I reported it to the security group and that still worked until I stopped banking with them a year or so later.
Reminds me of my favorite bug story from my own career. It was in my first year or two out of college. We were using a commercial C++ library for making HTTP calls out to another service. The initial symptom of the bug was that random requests would appear to come back with empty responses -- not just empty bodies, but the entire response was empty (not even any headers).
After a fair amount of testing, I was somehow able to determine that it wasn't actually random. The empty response occurred whenever the size in bytes of the entire request (headers and body together) was exactly 10 modulo 256, for example 266 bytes or 1034 bytes or 4106 bytes. Weird, right?
I went ahead and worked around the problem by putting in a heuristic when constructing the request: If the body size was such that the total request size would end up being close to 10 modulo 256, based on empirical knowledge of the typical size of our request headers, then add a dummy header to get out of the danger zone. That got us past the problem, but made me queasy.
At the time, I had looked at the code and noticed an uninitialized variable in the response parsing function, but it didn't really hit me until much later. The code was something like this:
void read_status_line(char *line) {
char c;
while (c != '\n') {
c = read_next_byte();
*(line++) = c;
}
}
Obviously this is wrong because it's checking c before reading it! But why the 10 modulo 256 condition? Of course, the ASCII code for newline is 10. Duh. So there must have been an earlier call stack where some other function had a local variable storing the length of the request, and this function's c variable landed smack-dab on the least-significant byte of that earlier value. Arrrrgh!
That sounds shockingly like a bug I remember one of our best developers finding when I worked at Homestead in the late 90's (and I remember being in awe then of his ability to deduce the pattern out of the seeming randomness).
The most interesting part of this story to me is actually that his wife noticed that the printer didn't work on Tuesdays. I'd have never, ever put that together, no matter how many times I saw it succeed or fail. I'd actually be more likely to figure it out by debugging the CUPS script than I would be observing my printer's behavior. Can a lot of people pick up on correlations like that? "Ever notice how it's always Tuesday when the printer won't work?"
I remember a time when I was a relatively computer-illiterate youngling and my favourite game was not particularly stable. I remember churning through many progressively more elaborate superstitions in search of some correlation I could exploit to prevent the dreaded crashes.
Of course, I never succeeded. But the point should stand: people are built around finding and exploiting correlations to their benefit, and we're actually quite good at it. It is not terribly surprising to me that somebody quickly noticed that the annoying bad thing that is consistently happening today also happened consistently for a day precisely one week ago. When the pattern repeats again for a third time...
You'd figure it out eventually, especially if it was exceedingly annoying and/or your job depended on it.
Example: what if your internet drops out for an hour every Sunday morning at 2am? You'd notice. (Mine does. Damn it Comcast!) If guy's wife relied on the printer similarly, it might be as acute as losing internet for you.
Every Tuesday morning I have to re-authenticate the 2FA that lets me connect to various things at work, and it isn't documented anywhere, but you'd better believe I figured out the pattern real fast. Being annoyed at the same time of the work week is very noticeable.
I have a Google Hangouts Tuesday mornings. Once a month, reliably, I'd suddenly be kicked off my own Hangout because Google has decided it's time for me to type my password back in.
Of course I do, reliably pushing the problem another month down the road.
Not having preconceptions about likely causes surely helped but it's possible that her use of the printer was not uniform across weekdays which might make it stand out more.
Yes, I would have never put those together either. One of the duplicate bugs has a similar thread, where a user describes reinstalling his system three times and failing to print every time. But after running updates after the third install, it suddenly worked again.
A helpful response then pointed out that after the third reinstall, his system clock must have gone past midnight, which means it was no longer Tuesday...
My most memorial bugfix was when I was on a team ,temporary ,that did email encryption/decryption.
They had one customer where some mails could not get decrypted, they had been figthing with this for one year, no one could figure out what was going on.
I told them to do a dump for a week with the good and bad emails.
After one week I was given the dump of files, looked at the count of bad vs good, did some math in my head and said:
"Hmm, it appears that about 1/256 mails is bad.That could indicate that the problem is releated to a specific byte having a specific value in the random 256 bit AES key.
If there is a specific value giving problems it is probaly 0x00 and the position I would guess being at the last or first byte."
I did a check by decoding all SMIME mails to readable text with openssl- sure, all bad emails had 0x00 as the least signicant byte.
Then i looked at asn1 spec and discovered it was a bit vague about if the least significant byte had to be there if it was 0x00.
I inserted a line into the custom written IBM 4764 CCA driver written in c called by JNI.
Then all emails decrypted.
The team dropped their jaws- they had been figthing with it for 1 year and I diagnosed the bug only by looking at the good/bad ratio :)
I might remember some details wrong- but the big picture is correct :)
I think the singular power of an experienced programmer is being able to reason about high-level structure, while being able to deep-dive into any lower level detail, anywhere in the overall system, and therefore be able to carve out large swaths of problem space just by /thinking/. That's much faster than typing-and-executing-and-reading-logs.
The TL;DR is that the "file" utility was miscategorising files that had "Tue" in the first few bytes of a file as an Erlang JAM file, with knock on effects for PostScript files generates with a header comment with Tue in the date.
The circumstances under which I discovered/reported the bug were totally coincidental too. I noticed a friend use the file command and thought to myself "Hmm I've never actually used that command before. Let me try it out." Ran it on a TODO list text file I had lying around, scratched my head over the "Erlang JAM file" output, and went from there.
Because it was trying to identify that type of file by looking for a particular full date string (bizarre in itself) and the wrong syntax was used to specify the string (unescaped spaces).
The weirdest case at my tenure as a neighborhood computer tech was a personal notebook computer that would not boot up at the customer's apartment. Of course we assumed user error, but further investigation revealed that if the computer were running as it approached the home, it would bluescreen about a block away.
We guessed it was due to some kind of RF interference from a transmitter on the apartment building. Removing the WiFi module and the optical drive had no effect, so we further guessed it was interference within the motherboard or display. Rather than investigate further, we replaced the notebook at that point.
My car radio goes from everything working fine to complete static on every station in like 10 feet as I drive into my local Safeway parking lot. I'm pretty sure they are jamming the signal, I just don't know why.
Some places with shopping carts have an "invisible fence" that locks up one of the wheels of the cart when you push the cart outside the property. These are usually marked with a yellow line on the ground. I presume they use some kind of RF field with an underground wire. Maybe that Safeway has some issues with theirs.
I have an anecdote, which isn't mine but comes from someone I know personally. This guy was working as a service tech, and was called out to diagnose a problem with a computer that had been recently moved. It worked most of the time, but any attempt to use the tape drive failed within a certain number of seconds (this was long ago, when tape drives were still a thing). Everything had worked fine before the move, and diagnostics didn't show anything out of place. Then he happened to look out the window - this was a military installation, and there was a radar dish rotating nearby. The failures occurred exactly when the radar dish was pointed their direction. It turns out the computer had been moved up one floor, which strengthened the interference just enough to cause the failure.
But "Tue" is not at the fourth byte in the example, which has:
%%CreationDate: (Tue Mar 3 19:47:42 2009)
Something munged he the data. Perhaps some step which removes all characters after %%, except those in parentheses?
%%(Tue Mar 3 ...)
Now we're at the fourth byte. Another hypothesis is that the second incorrect match is kicking in.
That is to say, some fields are added above %% CreationDate such that the Tue lands on position 79. The bug that was fixed in the magic database is this:
-+4 string Tue Jan 22 14:32:44 MET 1991 Erlang JAM file - version 4.2
-+79 string Tue Jan 22 14:32:44 MET 1991 Erlang JAM file - version 4.2
++4 string Tue\ Jan\ 22\ 14:32:44\ MET\ 1991 Erlang JAM file - version 4.2
++79 string Tue\ Jan\ 22\ 14:32:44\ MET\ 1991 Erlang JAM file - version 4.2
(This is a patch of a patch: a fix to a an incorrect patch.) There are two matches for this special date which identifies JAM files: one at offset 4, but a possible other one at offset 79 which will cause the same problem.
The real bug here is arguably the CUPS script. It should identify the file's type before munging it. And it shouldn't use a completely general, highly configurable utility whose data-driven file classification system is a moving target from release to release! This is a print script, so there is no reason to suspect that an input file is a Doom WAD file, or a Sun OS 4 MC68000 executable. The possibilities are quite limited, and can be handled with a bit of custom logic.
Did Brother people write this? If so, I'm not surprised.
Nobody should ever write code whose correct execution depends on the "file" utility classifying something. That is, not unless you write your own "magic" file and use only that file; then you're taking proper ownership of the classification logic, such that any bugs are likely to be your own.
The fact that file got something wrong here is a red herring; the file utility is wrong once in a while, as anyone knows who has been using various versions of it regularly regularly for a few decades. Installations of the utility are only suitable for one-off interactive use. You got a mystery file from out of the blue, and need a clue as to what it is. Run file on it to get an often useful opinion. It is only usable in an advisory role, not in an authoritative role.
I've noticed that printing is still one of the poorest UX aspects of *nix/OSS and regularly seems to suffer from errors so egregious that they can only be attributed to OSS devs not dogfooding these features. I'm assuming they just don't print much (I mean, we ALL print less than 20 years ago, but all the more reason to test these features which, when you need them to work you REALLY need them to work).
Perhaps you're thinking back to the days of manually configuring CUPS?
Any recent printer I've used has just been plug it in and hit print. A better experience than Windows in terms of included drivers and bonjour support too.
I agree completely. I set up a relative with Linux Mint KDE 17.3 a couple weeks ago and even I was surprised at how easy it was to set up the two printers he wanted to use: one was an old 2003-vintage LaserJet 1012 personal-sized laser printer with USB, the other a newer (I'm guessing 4yo) HP color inkjet of some kind that was WiFi-connected. For the first, I just plugged in the USB cable and a print queue was immediately and automatically set up; I didn't have to do anything. For the latter, I just went into the printer configuration utility, let it search the network, it found the printer and told me its model name/number, then I selected the appropriate driver and printed a test page. No driver downloads, no problems.
By contrast, I had a contract job a year or so ago at a large company where I was given a Win7 laptop and tried to connect to a big Ricoh laser printer. I spent hours messing around with driver downloads trying to get that to work. I finally had to call IT and they sent someone over, and he couldn't get it to work either; he finally found some crazy work-around which I've totally forgotten the details of now.
The only real problem I see with printing on Linux now is that, sometimes, there's multiple CUPS drivers for the same printer (foomatic, hpijs, Postscript, etc.), so it won't automatically pick one and it's not clear which is the best so you might have to just try one and see if it works. Most likely, they all work, but some might have additional features. HP printers are probably the best, though, since they seem to explicitly support Linux (such as with their hpijs drivers). If all printer makers had this level of support, and they cleaned up the redundant/competing drivers, there wouldn't be complaints.
CUPS is better, no doubt. And from a sysadmin perspective CUPS is great, but there are still crazy gotchas.
Part of this is the byzantine config of network discovery, CUPS, driver sources, etc. A big part of the problem is the pieces may be there but distros have done a mediocre job getting everything together on this in the best way for the user.
One of these days, someone may give me a credible explanation of why printing involves a systemwide daemon, but I kind of doubt it. I'd love to see a rearchitecting of the whole mess such that the whole print daemon runs in in a sandbox with user privileges.
In my experience it is the exact opposite. My Linux computers are always able to quickly connect and print to any printer I point them towards. I have never had as many problems as I have had when using Windows or OS X.
I can't say it's been any "easier" to connect to my Brother MFC-J4420DW on Fedora as it is on Windows, but it's no harder. Download the installation script from Brother, run it, it asks me for the hostname of my printer and I'm up and running.
In my experience printing in Linux has been pretty solid for at least the last 5 years or so. CUPS is CUPS, drivers are plentiful and one apt-get away, UIs get better.
Of course I'm limited to occassional document or a few tickets.
During my studies I had a course called "Advanced Network Administration". I learnt about the OSPF routing protocol and its Quagga [1] implementation and I had to prepare a simple installation that consisted of 3 Linux machines. They were connected with cheap USB network adapters.
After everything was configured I started the Quagga daemons and somehow they just didn't want to talk to each other. I've opened tcpdump to see what happens and the OSPF packets were exchanged properly. After a while the communication and routing was established. I thought that maybe the services just needed some time to discover the topology.
I've restarted the system to see if it's able to get up automatically, but the problem reoccured - daemons just didn't see each other. Again, I launched tcpdump, tweaked some settings and now it worked - until it didn't a few minutes later.
It take me a long time to find out that diagnostic tool I've used had actually changed the observed infrastructure (like in the quantum world). tcpdump enables the promiscuous mode on the network interfaces and apparently this was required for Quagga to run on the cheap USB ethernet adapters. I've used the ifconfig promisc and after that the OSPF worked stable.
Surely the real bug is the reliance on the 'file' utility in the first place? It attempts to quickly identify a file that could be literaly anything so it's not surprising (and indeed should be expected) that sometimes it gets it wrong.
I don't know the details of the CUPS script but presumably it can only deal with a small number of different file types. Implementing it's own detection to positively identify PS vs whatever other formats it deals with vs everything else would be far more robust.
One of our users complained that she could no longer print PDF documents. Everything else, Word, Excel, graphics, worked fine, but when she printed a PDF ... the printer did emit a page that - layout-wise - pretty much looked like it was supposed to, except all the text was complete and utter nonsense.
Or was it? I took one of the pages back to my desk, and later in the day I had an idle moment, and my eyes wandered across the page. The funny thing is, if I had not known what text was supposed to be on the page, I would not have noticed, but the text was not random at all. Instead, all the letters had been shifted by one place in the alphabet (i.e. "ABCD" became "BCDE").
I went back to the user and told her to check the little box that said "Print text as graphics" in the PDF viewers printing dialog, and voila - the page came out of the printer looking the way it was supposed to.
Printing that way did take longer than usual (a lot longer), but at least the results were correct.
To this day, I have no clue where the problem came from, and unfortunately, I did not have the time to investigate the issue further. I had never seen such a problem before or after.
In a way it's part of what I like about my job: These weird problems that seem to come out of nowhere for no apparent reason, and that just as often disappear back into the void before I really understand what is going on. It can be oh-so frustrating at times, but I cannot deny that I am totally into weird things, so some part of me really enjoyed the whole experience.
I once had the case with a desktop system that when you sat down and started typing it often hardware reseted. Turned out Dell left some metal piece in the case which was hanging between the case and the motherboard (in those few millimeter) and with some stronger desk vibration caused a shortcut.
My 6502 based FORTH systems would sometimes crash for no apparent reason after I tweaked some code and recompiled it. Whenever it got into crashy mode, it would crash in a completely different way, on a randomly different word. I'd put some debugging code in to diagnose the problem, and it would either disappear or move to another word! It was an infuriating Heizenbug!
It turns out that the 6502 has a bug [1] that when you do an indirect JMP ($xxFF) through a two byte address that straddles a page boundary, it would wrap around to the first byte of the same page instead of incrementing the high half of the address to get the first byte of the next page.
And of course the way that an indirect threaded FORTH system works is that each word has a "code field address" that the FORTH inner loop jumps through indirectly. So if a word's CFA just happened to straddle a page boundary, that word would crash!
6502 FORTH systems typically implemented the NEXT indirect threaded code inner interpreter efficiently by using self modifying code that patched an indirect JMP instruction on page zero whose operand was the W code field pointer. [2]
JMP indirect is a relatively rare instruction, and it's quite rare that it's triggered by normal static code (since you can usually catch the problem during testing), but self modifying code has a 1/256 chance of triggering it!
A later version of the 65C02 fixed that bug.
It could manifest in either compiled FORTH code, or the assembly kernel. The FIG FORTH compiler [3] worked around it at compile time by allocating an extra byte before defining a new word if its CFA would straddle a page boundary.
I defined an assembler macro for compiling words in the kernel that automatically padded in the special case, but the original 6502 FIG FORTH kernel had to be "checked and altered on any alteration" manually.
"I'm sure some of you noticed my code will break if the bytes of the word addressed by IP straddle a page boundary, but luckily that's a direct parallel to the NMOS 6502's buggy JMP-Indirect instruction. An effective solution can be found in Fig-Forth 6502, available in the "Monitors, Assemblers, and Interpreters" section here. (The issue is dealt with at compile time; there is no run-time cost. The word CREATE pre-pads the dictionary with an unused byte in the rare cases when the word about to be CREATEd would otherwise end up with a code-field straddling a page boundary.)"
; The following offset adjusts all code fields to avoid an
; address ending $XXFF. This must be checked and altered on
; any alteration , for the indirect jump at W-1 to operate !
;
.ORIGIN *+2
[...]
.WORD DP ;)
.WORD CAT ;| 6502 only. The code field
.WORD CLIT ;| must not straddle page
.BYTE $FD ;| boundaries
.WORD EQUAL ;|
.WORD ALLOT ;)
I vote this as the best heisenbug I've read so far. That's sounds like such a nightmare to debug. I might have never found it due to throwing the thing in the trash and buying a different machine. Forth is easy to port after all. :)
Funny, I remember that bug very well, it was used on the earlier apple IIs, sometime on purpose (!) (mostly by game protections). This was fixed on the 65c02.
I once found a bug in a weather applet that only occurred when the temperature exceeded 100 degress. The 3-digit temperature caused a cascade of formatting issues that rendered part of the applet unreadable. I believe the author used Celsius, and so would never have encountered this bug on their own.
And this is why we won't ever get AI. Humans seem to only manage to get to a certain level of complex before it all gets too much.
There are supposedly people in Boeing who understand literally every part of a 747, the wiring and the funny holes in the windows. But there is probably no one who understands all parts of Windows 10.
We're doomed to keep leaping like dolphins to reach a fish held too high by a sadistic Orlando world trainer
As I've said and been downvoted for before, there will never be Artificial Intelligence because a machine merely does what its program tells it. There will be plenty of Artificial Stupidity, however, as evidenced by this thread.
I've never seen it before to my recollection, and I spend a possibly unhealthy amount of time checking up on the stories here, so my guess is no it doesn't.
Did this get fixed, 7 years later?
Yesterday, we had a story about Microsoft's disk management service using lots of CPU time if the username contained "user". Microsoft's official reply was not to do that.
I once found a bug in Coyote Systems' load balancers where, if the USER-AGENT ended with "m", all packets were dropped. They use regular expressions for various rules, and I suspect someone typed "\m" where they meant "\n". Vendor denied problem, even after I submitted a test case which failed on their own web site's load balancer.
Many, many years ago, I found a bug in 4.3BSD which prevented TCP connections from establishing with certain other systems during odd numbered 4 hour periods. It took three days to find the bug in BSD's sequence number arithmetic. A combination of signed and unsigned casts was doing the wrong thing.
My favorite was a story from the 1980s of a program which would crash depending on the phase of the Moon!
Turned out to be because it was generating a date by calling a general purpose astronomical routine, then parsing the date out of that. The astronomical routine among other things included the phase of the moon, and during some phases you would overflow the buffer that was passed in.
Another classic was a tech support call from the 1990s where the person's computer rebooted every time they flushed the toilet. Turns out that the person was at the end of the electrical line..and on a septic system. Flush the toilet, the septic system came online, causing a power dip, and that was enough to reboot the computer. A UPS fixed that.
I heard a story about a terminal in a public terminal room that a user was able to consistently log in to if they were sitting down in a chair in front if the terminal, but never if they were standing up.
They thought it might be static electricity, or some mechanical problem, or "problem exists between keyboard and chair", but finally they noticed something else was amiss...
It turns out some joker had re-arranged the 1234567890 keys to be 0123456789, so when the user was standing up, they looked down at the keyboard and typed their password (which contained a digit, of course) by looking at the keys. But when they were sitting down, they touch typed without looking at the keys, and got their password correct!
1 reply →
My favorite was the server that couldn't send email further than 500 miles away.
http://www.ibiblio.org/harris/500milemail.html
5 replies →
A BeOS bug story similar to the phase of the Moon:
https://www.haiku-os.org/legacy-docs/benewsletter/Issue4-22....
> My favorite was a story from the 1980s of a program which would crash depending on the phase of the Moon!
Do you have a name for this program? It sounds a lot like an urban legend - when would you ever find it easier to parse a date out of an astronomical program than to use actual date handing capabilities from the system or a library?
4 replies →
Yes, it got fixed: https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619
The `file` utility failed to recognize PostScript files due to conflicting signature data at the magic file. The issue has been resolved long ago.
If it was my software, I would not have accepted that as a proper fix though. For me, the real bug is to rely on the "file" command (a user diagnostic tool) to choose different code paths in a printing program.
1 reply →
Yep, a month and a half after the bug was filed. Not too shabby for OSS!
Used to own a PCMCIA wi-fi adapter that predated finalization of the 802.11b standard and which could reliably be induced to blue screen if I caused an HTTP request to be sent over the interface containing a lowercase x.
This reminds me of that old joke: "How many Microsoft engineers does it take to replace a broken light-bulb?"
The answer is "None - Microsoft simply change the standard to darkness"
------------------------------------------
Also reminds me of the fact that there are no bugs in Microsoft software - only "Features".
I think these are re-purposings of jokes about Unix, which is justly infamous for having lists of bugs that would've been easier to fix than to reproduce, for canonizing its bugs as standards of behavior to work around, and for having startlingly lazy components to begin with (especially the POSIX string-handling libraries -- strcmp() can be written in minutes).
See _The UNIX-HATERS Handbook_ for more. https://en.wikipedia.org/wiki/The_Unix-Haters_Handbook
I have an Acer laptop (2008 model) which doesn't boot when pressing the power button (since a few years). Interestingly after pressing the power button you've to close the lid and then gentally press the closed lid once/twice (top center of lid where back of the camera is positioned) and then it immediately boots :)
The power button is on the opposite side (at the bottom left of the lid) so it can't be getting a pressure when lid is pressed after closed.
You might want to fix that "gentally" because it is easy to misread, and the instruction becomes definitely not office-safe.
These kinds of issues fascinate me, aside from how frustrating they can be. They highlight the immense complexity that underlies what I, a lowly web developer, do on a daily basis, and what effect some decisions can have. It definitely motivates me to be a lot more careful and thorough about the code I write, and that's probably a good thing considering that I'm a painfully self-aware javascript coder who eschews, among other things, writing tests for my code and who thoughtlessly uses modules that solve whatever problem I'm facing.
I could go on to write a love-letter to how a number of HN posts make me want to be a better programmer, but I'll keep it a simple as possible.
Being here, even now, and even though I do sort of feel like one of the 'older crew' suspiciously eyeing what appears to be an influx of 'others', to put it vaguely, ultimately what I love most about this place is reading about the 'old-timers' and how they worked, the 'oldest code that is still in use', the argument for and against LISPs, the problems of the JS/Node ecosystem (bit sick of that), the crazy shit some programmer created, and so on.
I'm pretty much addicted to HN, while I managed to cut out Facebook and Reddit. What keeps me here is the intense desire to be a hacker, and how, even now, HN fuels it. To not just make websites or work with the latest framework, but to geek out on things and become better at what I truly believe is a craft.
If at some point the noise overpowers the substance that I care for, I hereby request anyone who takes pity to let me know where to move on to. But so far, with dangs good moderation, I'm impressived with how well HN is holding out.
Do you have the link to the MS disk management story?
https://support.microsoft.com/en-us/kb/3053711
(from the link)
Resolution
To resolve the issue, do not create a user account contains the string "user" on the computer.
9 replies →
https://news.ycombinator.com/item?id=11710829
I had a bug with a Bank of America payment system which wouldn't accept registrations with upper-case Zs in the company name. I went into super sleuth mode and somehow found the company they hired to make it and got a number for their development department. After explaining the problem to the guy who answered his only response was, "How did you get this number??"
A few years ago I discovered that the wells-Fargo website would log you in by typing the correct password and some additional n characters after the password. I reported it to the security group and that still worked until I stopped banking with them a year or so later.
1 reply →
How did you get that number?
3 replies →
Reminds me of my favorite bug story from my own career. It was in my first year or two out of college. We were using a commercial C++ library for making HTTP calls out to another service. The initial symptom of the bug was that random requests would appear to come back with empty responses -- not just empty bodies, but the entire response was empty (not even any headers).
After a fair amount of testing, I was somehow able to determine that it wasn't actually random. The empty response occurred whenever the size in bytes of the entire request (headers and body together) was exactly 10 modulo 256, for example 266 bytes or 1034 bytes or 4106 bytes. Weird, right?
I went ahead and worked around the problem by putting in a heuristic when constructing the request: If the body size was such that the total request size would end up being close to 10 modulo 256, based on empirical knowledge of the typical size of our request headers, then add a dummy header to get out of the danger zone. That got us past the problem, but made me queasy.
At the time, I had looked at the code and noticed an uninitialized variable in the response parsing function, but it didn't really hit me until much later. The code was something like this:
Obviously this is wrong because it's checking c before reading it! But why the 10 modulo 256 condition? Of course, the ASCII code for newline is 10. Duh. So there must have been an earlier call stack where some other function had a local variable storing the length of the request, and this function's c variable landed smack-dab on the least-significant byte of that earlier value. Arrrrgh!
That sounds shockingly like a bug I remember one of our best developers finding when I worked at Homestead in the late 90's (and I remember being in awe then of his ability to deduce the pattern out of the seeming randomness).
The title reminds me of "the 500 mile email"
http://www.ibiblio.org/harris/500milemail.html
This is one story I always enjoy when it is brought up.
Me too. Are there any others you could recommend?
17 replies →
Great story. Also, never knew about units.
The most interesting part of this story to me is actually that his wife noticed that the printer didn't work on Tuesdays. I'd have never, ever put that together, no matter how many times I saw it succeed or fail. I'd actually be more likely to figure it out by debugging the CUPS script than I would be observing my printer's behavior. Can a lot of people pick up on correlations like that? "Ever notice how it's always Tuesday when the printer won't work?"
I remember a time when I was a relatively computer-illiterate youngling and my favourite game was not particularly stable. I remember churning through many progressively more elaborate superstitions in search of some correlation I could exploit to prevent the dreaded crashes.
Of course, I never succeeded. But the point should stand: people are built around finding and exploiting correlations to their benefit, and we're actually quite good at it. It is not terribly surprising to me that somebody quickly noticed that the annoying bad thing that is consistently happening today also happened consistently for a day precisely one week ago. When the pattern repeats again for a third time...
only someone who doesn't understand computers would notice it, programmers would say "that doesn't matter" :)
You'd figure it out eventually, especially if it was exceedingly annoying and/or your job depended on it.
Example: what if your internet drops out for an hour every Sunday morning at 2am? You'd notice. (Mine does. Damn it Comcast!) If guy's wife relied on the printer similarly, it might be as acute as losing internet for you.
You wouldn't notice if you printed things on average once a week. But if you printed multiple times a day you'd soon notice the pattern, I suspect.
Every Tuesday morning I have to re-authenticate the 2FA that lets me connect to various things at work, and it isn't documented anywhere, but you'd better believe I figured out the pattern real fast. Being annoyed at the same time of the work week is very noticeable.
I have a Google Hangouts Tuesday mornings. Once a month, reliably, I'd suddenly be kicked off my own Hangout because Google has decided it's time for me to type my password back in.
Of course I do, reliably pushing the problem another month down the road.
Not having preconceptions about likely causes surely helped but it's possible that her use of the printer was not uniform across weekdays which might make it stand out more.
Yes, I would have never put those together either. One of the duplicate bugs has a similar thread, where a user describes reinstalling his system three times and failing to print every time. But after running updates after the third install, it suddenly worked again.
A helpful response then pointed out that after the third reinstall, his system clock must have gone past midnight, which means it was no longer Tuesday...
My most memorial bugfix was when I was on a team ,temporary ,that did email encryption/decryption. They had one customer where some mails could not get decrypted, they had been figthing with this for one year, no one could figure out what was going on. I told them to do a dump for a week with the good and bad emails. After one week I was given the dump of files, looked at the count of bad vs good, did some math in my head and said: "Hmm, it appears that about 1/256 mails is bad.That could indicate that the problem is releated to a specific byte having a specific value in the random 256 bit AES key. If there is a specific value giving problems it is probaly 0x00 and the position I would guess being at the last or first byte."
I did a check by decoding all SMIME mails to readable text with openssl- sure, all bad emails had 0x00 as the least signicant byte. Then i looked at asn1 spec and discovered it was a bit vague about if the least significant byte had to be there if it was 0x00. I inserted a line into the custom written IBM 4764 CCA driver written in c called by JNI. Then all emails decrypted.
The team dropped their jaws- they had been figthing with it for 1 year and I diagnosed the bug only by looking at the good/bad ratio :)
I might remember some details wrong- but the big picture is correct :)
Respect.
I think the singular power of an experienced programmer is being able to reason about high-level structure, while being able to deep-dive into any lower level detail, anywhere in the overall system, and therefore be able to carve out large swaths of problem space just by /thinking/. That's much faster than typing-and-executing-and-reading-logs.
The TL;DR is that the "file" utility was miscategorising files that had "Tue" in the first few bytes of a file as an Erlang JAM file, with knock on effects for PostScript files generates with a header comment with Tue in the date.
The circumstances under which I discovered/reported the bug were totally coincidental too. I noticed a friend use the file command and thought to myself "Hmm I've never actually used that command before. Let me try it out." Ran it on a TODO list text file I had lying around, scratched my head over the "Erlang JAM file" output, and went from there.
Because it was trying to identify that type of file by looking for a particular full date string (bizarre in itself) and the wrong syntax was used to specify the string (unescaped spaces).
The weirdest case at my tenure as a neighborhood computer tech was a personal notebook computer that would not boot up at the customer's apartment. Of course we assumed user error, but further investigation revealed that if the computer were running as it approached the home, it would bluescreen about a block away.
We guessed it was due to some kind of RF interference from a transmitter on the apartment building. Removing the WiFi module and the optical drive had no effect, so we further guessed it was interference within the motherboard or display. Rather than investigate further, we replaced the notebook at that point.
My car radio goes from everything working fine to complete static on every station in like 10 feet as I drive into my local Safeway parking lot. I'm pretty sure they are jamming the signal, I just don't know why.
Some places with shopping carts have an "invisible fence" that locks up one of the wheels of the cart when you push the cart outside the property. These are usually marked with a yellow line on the ground. I presume they use some kind of RF field with an underground wire. Maybe that Safeway has some issues with theirs.
I have an anecdote, which isn't mine but comes from someone I know personally. This guy was working as a service tech, and was called out to diagnose a problem with a computer that had been recently moved. It worked most of the time, but any attempt to use the tape drive failed within a certain number of seconds (this was long ago, when tape drives were still a thing). Everything had worked fine before the move, and diagnostics didn't show anything out of place. Then he happened to look out the window - this was a military installation, and there was a radar dish rotating nearby. The failures occurred exactly when the radar dish was pointed their direction. It turns out the computer had been moved up one floor, which strengthened the interference just enough to cause the failure.
But "Tue" is not at the fourth byte in the example, which has:
Something munged he the data. Perhaps some step which removes all characters after %%, except those in parentheses?
Now we're at the fourth byte. Another hypothesis is that the second incorrect match is kicking in. That is to say, some fields are added above %% CreationDate such that the Tue lands on position 79. The bug that was fixed in the magic database is this:
(This is a patch of a patch: a fix to a an incorrect patch.) There are two matches for this special date which identifies JAM files: one at offset 4, but a possible other one at offset 79 which will cause the same problem.
The real bug here is arguably the CUPS script. It should identify the file's type before munging it. And it shouldn't use a completely general, highly configurable utility whose data-driven file classification system is a moving target from release to release! This is a print script, so there is no reason to suspect that an input file is a Doom WAD file, or a Sun OS 4 MC68000 executable. The possibilities are quite limited, and can be handled with a bit of custom logic.
Did Brother people write this? If so, I'm not surprised.
Nobody should ever write code whose correct execution depends on the "file" utility classifying something. That is, not unless you write your own "magic" file and use only that file; then you're taking proper ownership of the classification logic, such that any bugs are likely to be your own.
The fact that file got something wrong here is a red herring; the file utility is wrong once in a while, as anyone knows who has been using various versions of it regularly regularly for a few decades. Installations of the utility are only suitable for one-off interactive use. You got a mystery file from out of the blue, and need a clue as to what it is. Run file on it to get an often useful opinion. It is only usable in an advisory role, not in an authoritative role.
I've noticed that printing is still one of the poorest UX aspects of *nix/OSS and regularly seems to suffer from errors so egregious that they can only be attributed to OSS devs not dogfooding these features. I'm assuming they just don't print much (I mean, we ALL print less than 20 years ago, but all the more reason to test these features which, when you need them to work you REALLY need them to work).
Perhaps you're thinking back to the days of manually configuring CUPS?
Any recent printer I've used has just been plug it in and hit print. A better experience than Windows in terms of included drivers and bonjour support too.
No, you've simply been lucky.
4 replies →
I agree completely. I set up a relative with Linux Mint KDE 17.3 a couple weeks ago and even I was surprised at how easy it was to set up the two printers he wanted to use: one was an old 2003-vintage LaserJet 1012 personal-sized laser printer with USB, the other a newer (I'm guessing 4yo) HP color inkjet of some kind that was WiFi-connected. For the first, I just plugged in the USB cable and a print queue was immediately and automatically set up; I didn't have to do anything. For the latter, I just went into the printer configuration utility, let it search the network, it found the printer and told me its model name/number, then I selected the appropriate driver and printed a test page. No driver downloads, no problems.
By contrast, I had a contract job a year or so ago at a large company where I was given a Win7 laptop and tried to connect to a big Ricoh laser printer. I spent hours messing around with driver downloads trying to get that to work. I finally had to call IT and they sent someone over, and he couldn't get it to work either; he finally found some crazy work-around which I've totally forgotten the details of now.
The only real problem I see with printing on Linux now is that, sometimes, there's multiple CUPS drivers for the same printer (foomatic, hpijs, Postscript, etc.), so it won't automatically pick one and it's not clear which is the best so you might have to just try one and see if it works. Most likely, they all work, but some might have additional features. HP printers are probably the best, though, since they seem to explicitly support Linux (such as with their hpijs drivers). If all printer makers had this level of support, and they cleaned up the redundant/competing drivers, there wouldn't be complaints.
CUPS is better, no doubt. And from a sysadmin perspective CUPS is great, but there are still crazy gotchas.
Part of this is the byzantine config of network discovery, CUPS, driver sources, etc. A big part of the problem is the pieces may be there but distros have done a mediocre job getting everything together on this in the best way for the user.
One of these days, someone may give me a credible explanation of why printing involves a systemwide daemon, but I kind of doubt it. I'd love to see a rearchitecting of the whole mess such that the whole print daemon runs in in a sandbox with user privileges.
How would you avoid having a common print queue, when all the documents can't fit in the printer memory?
11 replies →
In my experience it is the exact opposite. My Linux computers are always able to quickly connect and print to any printer I point them towards. I have never had as many problems as I have had when using Windows or OS X.
I can't say it's been any "easier" to connect to my Brother MFC-J4420DW on Fedora as it is on Windows, but it's no harder. Download the installation script from Brother, run it, it asks me for the hostname of my printer and I'm up and running.
11 replies →
In my experience printing in Linux has been pretty solid for at least the last 5 years or so. CUPS is CUPS, drivers are plentiful and one apt-get away, UIs get better.
Of course I'm limited to occassional document or a few tickets.
Stuff like this is why I find "Synthetic Biology" so fucking scary.
Don't worry, the natural stuff is probably just as messy, or worse.
If it helps, remember that grey-goo already took over the world, and you were born part of it.
During my studies I had a course called "Advanced Network Administration". I learnt about the OSPF routing protocol and its Quagga [1] implementation and I had to prepare a simple installation that consisted of 3 Linux machines. They were connected with cheap USB network adapters.
After everything was configured I started the Quagga daemons and somehow they just didn't want to talk to each other. I've opened tcpdump to see what happens and the OSPF packets were exchanged properly. After a while the communication and routing was established. I thought that maybe the services just needed some time to discover the topology.
I've restarted the system to see if it's able to get up automatically, but the problem reoccured - daemons just didn't see each other. Again, I launched tcpdump, tweaked some settings and now it worked - until it didn't a few minutes later.
It take me a long time to find out that diagnostic tool I've used had actually changed the observed infrastructure (like in the quantum world). tcpdump enables the promiscuous mode on the network interfaces and apparently this was required for Quagga to run on the cheap USB ethernet adapters. I've used the ifconfig promisc and after that the OSPF worked stable.
[1] http://www.nongnu.org/quagga/
CERN: LEP data confirm train time tables http://cds.cern.ch/record/1726241
CERN: Is the moon full? Just ask the LHC operators http://www.quantumdiaries.org/2012/06/07/is-the-moon-full-ju...
Near the end of that post, the commenter suggested a fix that includes the most qualified Useless Use of Cat entry[0] that I've ever seen!
[0] http://porkmail.org/era/unix/award.html#cat
Wait till you see where they found the print server!
http://www.informationweek.com/server-54-where-are-you/d/d-i...?
Surely the real bug is the reliance on the 'file' utility in the first place? It attempts to quickly identify a file that could be literaly anything so it's not surprising (and indeed should be expected) that sometimes it gets it wrong.
I don't know the details of the CUPS script but presumably it can only deal with a small number of different file types. Implementing it's own detection to positively identify PS vs whatever other formats it deals with vs everything else would be far more robust.
One of our users complained that she could no longer print PDF documents. Everything else, Word, Excel, graphics, worked fine, but when she printed a PDF ... the printer did emit a page that - layout-wise - pretty much looked like it was supposed to, except all the text was complete and utter nonsense.
Or was it? I took one of the pages back to my desk, and later in the day I had an idle moment, and my eyes wandered across the page. The funny thing is, if I had not known what text was supposed to be on the page, I would not have noticed, but the text was not random at all. Instead, all the letters had been shifted by one place in the alphabet (i.e. "ABCD" became "BCDE").
I went back to the user and told her to check the little box that said "Print text as graphics" in the PDF viewers printing dialog, and voila - the page came out of the printer looking the way it was supposed to.
Printing that way did take longer than usual (a lot longer), but at least the results were correct.
To this day, I have no clue where the problem came from, and unfortunately, I did not have the time to investigate the issue further. I had never seen such a problem before or after.
In a way it's part of what I like about my job: These weird problems that seem to come out of nowhere for no apparent reason, and that just as often disappear back into the void before I really understand what is going on. It can be oh-so frustrating at times, but I cannot deny that I am totally into weird things, so some part of me really enjoyed the whole experience.
I love the modification that pipes the output of cat into sed; doesn't he realize that cat is redundant at that point?
I'll admit to being a superfluous cat user. I just like typing cat.
I once had the case with a desktop system that when you sat down and started typing it often hardware reseted. Turned out Dell left some metal piece in the case which was hanging between the case and the motherboard (in those few millimeter) and with some stronger desk vibration caused a shortcut.
Here's a great collection of classic bug reports (including the never-printing-on-tuesdays): https://news.ycombinator.com/item?id=10309401
My 6502 based FORTH systems would sometimes crash for no apparent reason after I tweaked some code and recompiled it. Whenever it got into crashy mode, it would crash in a completely different way, on a randomly different word. I'd put some debugging code in to diagnose the problem, and it would either disappear or move to another word! It was an infuriating Heizenbug!
It turns out that the 6502 has a bug [1] that when you do an indirect JMP ($xxFF) through a two byte address that straddles a page boundary, it would wrap around to the first byte of the same page instead of incrementing the high half of the address to get the first byte of the next page.
And of course the way that an indirect threaded FORTH system works is that each word has a "code field address" that the FORTH inner loop jumps through indirectly. So if a word's CFA just happened to straddle a page boundary, that word would crash!
6502 FORTH systems typically implemented the NEXT indirect threaded code inner interpreter efficiently by using self modifying code that patched an indirect JMP instruction on page zero whose operand was the W code field pointer. [2]
JMP indirect is a relatively rare instruction, and it's quite rare that it's triggered by normal static code (since you can usually catch the problem during testing), but self modifying code has a 1/256 chance of triggering it!
A later version of the 65C02 fixed that bug. It could manifest in either compiled FORTH code, or the assembly kernel. The FIG FORTH compiler [3] worked around it at compile time by allocating an extra byte before defining a new word if its CFA would straddle a page boundary. I defined an assembler macro for compiling words in the kernel that automatically padded in the special case, but the original 6502 FIG FORTH kernel had to be "checked and altered on any alteration" manually.
[1] http://everything2.com/title/6502+indirect+JMP+bug
[2] http://forum.6502.org/viewtopic.php?t=1619
"I'm sure some of you noticed my code will break if the bytes of the word addressed by IP straddle a page boundary, but luckily that's a direct parallel to the NMOS 6502's buggy JMP-Indirect instruction. An effective solution can be found in Fig-Forth 6502, available in the "Monitors, Assemblers, and Interpreters" section here. (The issue is dealt with at compile time; there is no run-time cost. The word CREATE pre-pads the dictionary with an unused byte in the rare cases when the word about to be CREATEd would otherwise end up with a code-field straddling a page boundary.)"
[3] http://www.dwheeler.com/6502/FIG6502.ASM
[...]
I vote this as the best heisenbug I've read so far. That's sounds like such a nightmare to debug. I might have never found it due to throwing the thing in the trash and buying a different machine. Forth is easy to port after all. :)
Funny, I remember that bug very well, it was used on the earlier apple IIs, sometime on purpose (!) (mostly by game protections). This was fixed on the 65c02.
I once found a bug in a weather applet that only occurred when the temperature exceeded 100 degress. The 3-digit temperature caused a cascade of formatting issues that rendered part of the applet unreadable. I believe the author used Celsius, and so would never have encountered this bug on their own.
Relevant: https://gyrovague.com/2015/07/29/crashes-only-on-wednesdays/
"tue" means "kill" in french... I wonder if a french programmer somewhere had something to do with this?
And this is why we won't ever get AI. Humans seem to only manage to get to a certain level of complex before it all gets too much.
There are supposedly people in Boeing who understand literally every part of a 747, the wiring and the funny holes in the windows. But there is probably no one who understands all parts of Windows 10.
We're doomed to keep leaping like dolphins to reach a fish held too high by a sadistic Orlando world trainer
People die all the time from "bugs" in the genes, or from acquired defects. This doesn't stop the rest of the people and the race going on.
Same thing with AI. Some will crash and burn. Others will manage to continue.
Nothing is eternal. We only need stuff to work for a while for it to be useful.
As I've said and been downvoted for before, there will never be Artificial Intelligence because a machine merely does what its program tells it. There will be plenty of Artificial Stupidity, however, as evidenced by this thread.
And you do what the neurons in your brain make you do.
Article from today - http://www.theatlantic.com/magazine/archive/2016/06/theres-n...
3 replies →
Off topic and no arguments to support your case, sounds like downvote material to me.
So what's the lesson here? What should we learn from that?
Make sure you escape spaces? Or: https://news.ycombinator.com/item?id=11718716
Lesson is: Not every post here should teach a lesson?
Is it just me or does this get posted every month?
I've never seen it before to my recollection, and I spend a possibly unhealthy amount of time checking up on the stories here, so my guess is no it doesn't.
It was a rhetorical question that was incorrect. Every month is a little bit too frequent, but not by much...
https://hn.algolia.com/?query=openoffice%20tuesday
Probably because it gets mentioned often in addition to be reposted often.
No, it is a cups bug indeed. File was never guaranteed to be precise in the first place, it is not a good idea to rely on it.
Yet another reason I don't let OpenOffice or any Linux UIs slow me down. It's all about the command line and always will be.
> So it's not a problem w/ openoffice.org, cups, or the brother printer drivers. It is a bug in the `file` utility, and documented at https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619.
This bug had nothing to do with OO. This was just a cute manifestation of the actual bug.
Fascinating bug :)