“My wife has complained that OpenOffice will never print on Tuesdays” (2009)

10 years ago (bugs.launchpad.net)

160 comments

hardmath123

Did this get fixed, 7 years later?

Yesterday, we had a story about Microsoft's disk management service using lots of CPU time if the username contained "user". Microsoft's official reply was not to do that.

I once found a bug in Coyote Systems' load balancers where, if the USER-AGENT ended with "m", all packets were dropped. They use regular expressions for various rules, and I suspect someone typed "\m" where they meant "\n". Vendor denied problem, even after I submitted a test case which failed on their own web site's load balancer.

Many, many years ago, I found a bug in 4.3BSD which prevented TCP connections from establishing with certain other systems during odd numbered 4 hour periods. It took three days to find the bug in BSD's sequence number arithmetic. A combination of signed and unsigned casts was doing the wrong thing.

btilly 10 years ago
My favorite was a story from the 1980s of a program which would crash depending on the phase of the Moon!
Turned out to be because it was generating a date by calling a general purpose astronomical routine, then parsing the date out of that. The astronomical routine among other things included the phase of the moon, and during some phases you would overflow the buffer that was passed in.
Another classic was a tech support call from the 1990s where the person's computer rebooted every time they flushed the toilet. Turns out that the person was at the end of the electrical line..and on a septic system. Flush the toilet, the septic system came online, causing a power dip, and that was enough to reboot the computer. A UPS fixed that.
- DonHopkins 10 years ago
  
  I heard a story about a terminal in a public terminal room that a user was able to consistently log in to if they were sitting down in a chair in front if the terminal, but never if they were standing up.
  They thought it might be static electricity, or some mechanical problem, or "problem exists between keyboard and chair", but finally they noticed something else was amiss...
  It turns out some joker had re-arranged the 1234567890 keys to be 0123456789, so when the user was standing up, they looked down at the keyboard and typed their password (which contained a digit, of course) by looking at the keys. But when they were sitting down, they touch typed without looking at the keys, and got their password correct!
  
  1 reply →
- wdr1 10 years ago
  
  My favorite was the server that couldn't send email further than 500 miles away.
  http://www.ibiblio.org/harris/500milemail.html
  
  5 replies →
- cpeterso 10 years ago
  
  A BeOS bug story similar to the phase of the Moon:
  Two [BeOS] test engineers were in a crunch. The floppy drive they were currently testing would work all day while they ran a variety of stress tests, but the exact same tests would run for only eight hours at night. After a few days of double-checking the hardware, the testing procedure, and the recording devices, they decided to stay the night and watch what happened. For eight hours they stared at the floppy drive and drank espresso. The long dark night slowly turned into day and the sun shone in the window. The angled sunlight triggered the write-protection mechanism, which caused a write failure. A new casing was designed and the problem was solved.
  https://www.haiku-os.org/legacy-docs/benewsletter/Issue4-22....
- x1798DE 10 years ago
  
  > My favorite was a story from the 1980s of a program which would crash depending on the phase of the Moon!
  Do you have a name for this program? It sounds a lot like an urban legend - when would you ever find it easier to parse a date out of an astronomical program than to use actual date handing capabilities from the system or a library?
  
  4 replies →
raldu 10 years ago
Yes, it got fixed: https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619
The `file` utility failed to recognize PostScript files due to conflicting signature data at the magic file. The issue has been resolved long ago.
- tremon 10 years ago
  
  If it was my software, I would not have accepted that as a proper fix though. For me, the real bug is to rely on the "file" command (a user diagnostic tool) to choose different code paths in a printing program.
  
  1 reply →
- BrandonM 10 years ago
  
  Yep, a month and a half after the bug was filed. Not too shabby for OSS!
jameshart 10 years ago

Used to own a PCMCIA wi-fi adapter that predated finalization of the 802.11b standard and which could reliably be induced to blue screen if I caused an HTTP request to be sent over the interface containing a lowercase x.
SeanDav 10 years ago
This reminds me of that old joke: "How many Microsoft engineers does it take to replace a broken light-bulb?"
The answer is "None - Microsoft simply change the standard to darkness"
------------------------------------------
Also reminds me of the fact that there are no bugs in Microsoft software - only "Features".
- clock_tower 10 years ago
  
  I think these are re-purposings of jokes about Unix, which is justly infamous for having lists of bugs that would've been easier to fix than to reproduce, for canonizing its bugs as standards of behavior to work around, and for having startlingly lazy components to begin with (especially the POSIX string-handling libraries -- strcmp() can be written in minutes).
  See _The UNIX-HATERS Handbook_ for more. https://en.wikipedia.org/wiki/The_Unix-Haters_Handbook
varunom 10 years ago
I have an Acer laptop (2008 model) which doesn't boot when pressing the power button (since a few years). Interestingly after pressing the power button you've to close the lid and then gentally press the closed lid once/twice (top center of lid where back of the camera is positioned) and then it immediately boots :)
The power button is on the opposite side (at the bottom left of the lid) so it can't be getting a pressure when lid is pressed after closed.
- ptaipale 10 years ago
  
  You might want to fix that "gentally" because it is easy to misread, and the instruction becomes definitely not office-safe.
mercer 10 years ago

These kinds of issues fascinate me, aside from how frustrating they can be. They highlight the immense complexity that underlies what I, a lowly web developer, do on a daily basis, and what effect some decisions can have. It definitely motivates me to be a lot more careful and thorough about the code I write, and that's probably a good thing considering that I'm a painfully self-aware javascript coder who eschews, among other things, writing tests for my code and who thoughtlessly uses modules that solve whatever problem I'm facing.
I could go on to write a love-letter to how a number of HN posts make me want to be a better programmer, but I'll keep it a simple as possible.
Being here, even now, and even though I do sort of feel like one of the 'older crew' suspiciously eyeing what appears to be an influx of 'others', to put it vaguely, ultimately what I love most about this place is reading about the 'old-timers' and how they worked, the 'oldest code that is still in use', the argument for and against LISPs, the problems of the JS/Node ecosystem (bit sick of that), the crazy shit some programmer created, and so on.
I'm pretty much addicted to HN, while I managed to cut out Facebook and Reddit. What keeps me here is the intense desire to be a hacker, and how, even now, HN fuels it. To not just make websites or work with the latest framework, but to geek out on things and become better at what I truly believe is a craft.
If at some point the noise overpowers the substance that I care for, I hereby request anyone who takes pity to let me know where to move on to. But so far, with dangs good moderation, I'm impressived with how well HN is holding out.
askvictor 10 years ago
Do you have the link to the MS disk management story?
- iamaaditya 10 years ago
  
  https://support.microsoft.com/en-us/kb/3053711
  (from the link)
  Resolution
  To resolve the issue, do not create a user account contains the string "user" on the computer.
  
  9 replies →
- pessimizer 10 years ago
  
  https://news.ycombinator.com/item?id=11710829
jrcii 10 years ago
I had a bug with a Bank of America payment system which wouldn't accept registrations with upper-case Zs in the company name. I went into super sleuth mode and somehow found the company they hired to make it and got a number for their development department. After explaining the problem to the guy who answered his only response was, "How did you get this number??"
- S_A_P 10 years ago
  
  A few years ago I discovered that the wells-Fargo website would log you in by typing the correct password and some additional n characters after the password. I reported it to the security group and that still worked until I stopped banking with them a year or so later.
  
  1 reply →
- DarkLinkXXXX 10 years ago
  
  How did you get that number?
  
  3 replies →

sampsonetics 10 years ago

Reminds me of my favorite bug story from my own career. It was in my first year or two out of college. We were using a commercial C++ library for making HTTP calls out to another service. The initial symptom of the bug was that random requests would appear to come back with empty responses -- not just empty bodies, but the entire response was empty (not even any headers).

After a fair amount of testing, I was somehow able to determine that it wasn't actually random. The empty response occurred whenever the size in bytes of the entire request (headers and body together) was exactly 10 modulo 256, for example 266 bytes or 1034 bytes or 4106 bytes. Weird, right?

I went ahead and worked around the problem by putting in a heuristic when constructing the request: If the body size was such that the total request size would end up being close to 10 modulo 256, based on empirical knowledge of the typical size of our request headers, then add a dummy header to get out of the danger zone. That got us past the problem, but made me queasy.

At the time, I had looked at the code and noticed an uninitialized variable in the response parsing function, but it didn't really hit me until much later. The code was something like this:

  void read_status_line(char *line) {
    char c;
    while (c != '\n') {
      c = read_next_byte();
      *(line++) = c;
    }
  }

Obviously this is wrong because it's checking c before reading it! But why the 10 modulo 256 condition? Of course, the ASCII code for newline is 10. Duh. So there must have been an earlier call stack where some other function had a local variable storing the length of the request, and this function's c variable landed smack-dab on the least-significant byte of that earlier value. Arrrrgh!

thoman23 10 years ago

That sounds shockingly like a bug I remember one of our best developers finding when I worked at Homestead in the late 90's (and I remember being in awe then of his ability to deduce the pattern out of the seeming randomness).

mpeg 10 years ago

The title reminds me of "the 500 mile email"

http://www.ibiblio.org/harris/500milemail.html

K-Wall 10 years ago
This is one story I always enjoy when it is brought up.
- ChristianBundy 10 years ago
  
  Me too. Are there any others you could recommend?
  
  17 replies →
n72 10 years ago

Great story. Also, never knew about units.

icambron 10 years ago

The most interesting part of this story to me is actually that his wife noticed that the printer didn't work on Tuesdays. I'd have never, ever put that together, no matter how many times I saw it succeed or fail. I'd actually be more likely to figure it out by debugging the CUPS script than I would be observing my printer's behavior. Can a lot of people pick up on correlations like that? "Ever notice how it's always Tuesday when the printer won't work?"

Nacraile 10 years ago

I remember a time when I was a relatively computer-illiterate youngling and my favourite game was not particularly stable. I remember churning through many progressively more elaborate superstitions in search of some correlation I could exploit to prevent the dreaded crashes.
Of course, I never succeeded. But the point should stand: people are built around finding and exploiting correlations to their benefit, and we're actually quite good at it. It is not terribly surprising to me that somebody quickly noticed that the annoying bad thing that is consistently happening today also happened consistently for a day precisely one week ago. When the pattern repeats again for a third time...
linuxfan2718 10 years ago
only someone who doesn't understand computers would notice it, programmers would say "that doesn't matter" :)
- beambot 10 years ago
  
  You'd figure it out eventually, especially if it was exceedingly annoying and/or your job depended on it.
  Example: what if your internet drops out for an hour every Sunday morning at 2am? You'd notice. (Mine does. Damn it Comcast!) If guy's wife relied on the printer similarly, it might be as acute as losing internet for you.
- paublyrne 10 years ago
  
  You wouldn't notice if you printed things on average once a week. But if you printed multiple times a day you'd soon notice the pattern, I suspect.
bonniemuffin 10 years ago
Every Tuesday morning I have to re-authenticate the 2FA that lets me connect to various things at work, and it isn't documented anywhere, but you'd better believe I figured out the pattern real fast. Being annoyed at the same time of the work week is very noticeable.
- terinjokes 10 years ago
  
  I have a Google Hangouts Tuesday mornings. Once a month, reliably, I'd suddenly be kicked off my own Hangout because Google has decided it's time for me to type my password back in.
  Of course I do, reliably pushing the problem another month down the road.
aptwebapps 10 years ago

Not having preconceptions about likely causes surely helped but it's possible that her use of the printer was not uniform across weekdays which might make it stand out more.
tremon 10 years ago

Yes, I would have never put those together either. One of the duplicate bugs has a similar thread, where a user describes reinstalling his system three times and failing to print every time. But after running updates after the third install, it suddenly worked again.
A helpful response then pointed out that after the third reinstall, his system clock must have gone past midnight, which means it was no longer Tuesday...

mazda11 10 years ago

My most memorial bugfix was when I was on a team ,temporary ,that did email encryption/decryption. They had one customer where some mails could not get decrypted, they had been figthing with this for one year, no one could figure out what was going on. I told them to do a dump for a week with the good and bad emails. After one week I was given the dump of files, looked at the count of bad vs good, did some math in my head and said: "Hmm, it appears that about 1/256 mails is bad.That could indicate that the problem is releated to a specific byte having a specific value in the random 256 bit AES key. If there is a specific value giving problems it is probaly 0x00 and the position I would guess being at the last or first byte."

I did a check by decoding all SMIME mails to readable text with openssl- sure, all bad emails had 0x00 as the least signicant byte. Then i looked at asn1 spec and discovered it was a bit vague about if the least significant byte had to be there if it was 0x00. I inserted a line into the custom written IBM 4764 CCA driver written in c called by JNI. Then all emails decrypted.

The team dropped their jaws- they had been figthing with it for 1 year and I diagnosed the bug only by looking at the good/bad ratio :)

I might remember some details wrong- but the big picture is correct :)

pwang 10 years ago

Respect.
I think the singular power of an experienced programmer is being able to reason about high-level structure, while being able to deep-dive into any lower level detail, anywhere in the overall system, and therefore be able to carve out large swaths of problem space just by /thinking/. That's much faster than typing-and-executing-and-reading-logs.

alblue 10 years ago

The TL;DR is that the "file" utility was miscategorising files that had "Tue" in the first few bytes of a file as an Erlang JAM file, with knock on effects for PostScript files generates with a header comment with Tue in the date.

kaizensoze 10 years ago

The circumstances under which I discovered/reported the bug were totally coincidental too. I noticed a friend use the file command and thought to myself "Hmm I've never actually used that command before. Let me try it out." Ran it on a TODO list text file I had lying around, scratched my head over the "Erlang JAM file" output, and went from there.
wrs 10 years ago

Because it was trying to identify that type of file by looking for a particular full date string (bizarre in itself) and the wrong syntax was used to specify the string (unescaped spaces).

nilstycho 10 years ago

The weirdest case at my tenure as a neighborhood computer tech was a personal notebook computer that would not boot up at the customer's apartment. Of course we assumed user error, but further investigation revealed that if the computer were running as it approached the home, it would bluescreen about a block away.

We guessed it was due to some kind of RF interference from a transmitter on the apartment building. Removing the WiFi module and the optical drive had no effect, so we further guessed it was interference within the motherboard or display. Rather than investigate further, we replaced the notebook at that point.

jon_richards 10 years ago
My car radio goes from everything working fine to complete static on every station in like 10 feet as I drive into my local Safeway parking lot. I'm pretty sure they are jamming the signal, I just don't know why.
- acranox 10 years ago
  
  Some places with shopping carts have an "invisible fence" that locks up one of the wheels of the cart when you push the cart outside the property. These are usually marked with a yellow line on the ground. I presume they use some kind of RF field with an underground wire. Maybe that Safeway has some issues with theirs.

mark-r 10 years ago

I have an anecdote, which isn't mine but comes from someone I know personally. This guy was working as a service tech, and was called out to diagnose a problem with a computer that had been recently moved. It worked most of the time, but any attempt to use the tape drive failed within a certain number of seconds (this was long ago, when tape drives were still a thing). Everything had worked fine before the move, and diagnostics didn't show anything out of place. Then he happened to look out the window - this was a military installation, and there was a radar dish rotating nearby. The failures occurred exactly when the radar dish was pointed their direction. It turns out the computer had been moved up one floor, which strengthened the interference just enough to cause the failure.

kazinator 10 years ago

But "Tue" is not at the fourth byte in the example, which has:

   %%CreationDate: (Tue Mar 3 19:47:42 2009)

Something munged he the data. Perhaps some step which removes all characters after %%, except those in parentheses?

   %%(Tue Mar 3 ...)

Now we're at the fourth byte. Another hypothesis is that the second incorrect match is kicking in. That is to say, some fields are added above %% CreationDate such that the Tue lands on position 79. The bug that was fixed in the magic database is this:

  -+4	string	Tue Jan 22 14:32:44 MET 1991	Erlang JAM file - version 4.2 
  -+79	string	Tue Jan 22 14:32:44 MET 1991	Erlang JAM file - version 4.2
  ++4	string	Tue\ Jan\ 22\ 14:32:44\ MET\ 1991	Erlang JAM file - version 4.2
  ++79	string	Tue\ Jan\ 22\ 14:32:44\ MET\ 1991	Erlang JAM file - version 4.2

(This is a patch of a patch: a fix to a an incorrect patch.) There are two matches for this special date which identifies JAM files: one at offset 4, but a possible other one at offset 79 which will cause the same problem.

The real bug here is arguably the CUPS script. It should identify the file's type before munging it. And it shouldn't use a completely general, highly configurable utility whose data-driven file classification system is a moving target from release to release! This is a print script, so there is no reason to suspect that an input file is a Doom WAD file, or a Sun OS 4 MC68000 executable. The possibilities are quite limited, and can be handled with a bit of custom logic.

Did Brother people write this? If so, I'm not surprised.

Nobody should ever write code whose correct execution depends on the "file" utility classifying something. That is, not unless you write your own "magic" file and use only that file; then you're taking proper ownership of the classification logic, such that any bugs are likely to be your own.

The fact that file got something wrong here is a red herring; the file utility is wrong once in a while, as anyone knows who has been using various versions of it regularly regularly for a few decades. Installations of the utility are only suitable for one-off interactive use. You got a mystery file from out of the blue, and need a clue as to what it is. Run file on it to get an often useful opinion. It is only usable in an advisory role, not in an authoritative role.

Adaptive 10 years ago

I've noticed that printing is still one of the poorest UX aspects of *nix/OSS and regularly seems to suffer from errors so egregious that they can only be attributed to OSS devs not dogfooding these features. I'm assuming they just don't print much (I mean, we ALL print less than 20 years ago, but all the more reason to test these features which, when you need them to work you REALLY need them to work).

ultramancool 10 years ago
Perhaps you're thinking back to the days of manually configuring CUPS?
Any recent printer I've used has just been plug it in and hit print. A better experience than Windows in terms of included drivers and bonjour support too.
- dbcurtis 10 years ago
  
  No, you've simply been lucky.
  
  4 replies →
- Grishnakh 10 years ago
  
  I agree completely. I set up a relative with Linux Mint KDE 17.3 a couple weeks ago and even I was surprised at how easy it was to set up the two printers he wanted to use: one was an old 2003-vintage LaserJet 1012 personal-sized laser printer with USB, the other a newer (I'm guessing 4yo) HP color inkjet of some kind that was WiFi-connected. For the first, I just plugged in the USB cable and a print queue was immediately and automatically set up; I didn't have to do anything. For the latter, I just went into the printer configuration utility, let it search the network, it found the printer and told me its model name/number, then I selected the appropriate driver and printed a test page. No driver downloads, no problems.
  By contrast, I had a contract job a year or so ago at a large company where I was given a Win7 laptop and tried to connect to a big Ricoh laser printer. I spent hours messing around with driver downloads trying to get that to work. I finally had to call IT and they sent someone over, and he couldn't get it to work either; he finally found some crazy work-around which I've totally forgotten the details of now.
  The only real problem I see with printing on Linux now is that, sometimes, there's multiple CUPS drivers for the same printer (foomatic, hpijs, Postscript, etc.), so it won't automatically pick one and it's not clear which is the best so you might have to just try one and see if it works. Most likely, they all work, but some might have additional features. HP printers are probably the best, though, since they seem to explicitly support Linux (such as with their hpijs drivers). If all printer makers had this level of support, and they cleaned up the redundant/competing drivers, there wouldn't be complaints.
- Adaptive 10 years ago
  
  CUPS is better, no doubt. And from a sysadmin perspective CUPS is great, but there are still crazy gotchas.
  Part of this is the byzantine config of network discovery, CUPS, driver sources, etc. A big part of the problem is the pieces may be there but distros have done a mediocre job getting everything together on this in the best way for the user.
amluto 10 years ago
One of these days, someone may give me a credible explanation of why printing involves a systemwide daemon, but I kind of doubt it. I'd love to see a rearchitecting of the whole mess such that the whole print daemon runs in in a sandbox with user privileges.
- icebraining 10 years ago
  
  How would you avoid having a common print queue, when all the documents can't fit in the printer memory?
  
  11 replies →
dublinben 10 years ago
In my experience it is the exact opposite. My Linux computers are always able to quickly connect and print to any printer I point them towards. I have never had as many problems as I have had when using Windows or OS X.
- snuxoll 10 years ago
  
  I can't say it's been any "easier" to connect to my Brother MFC-J4420DW on Fedora as it is on Windows, but it's no harder. Download the installation script from Brother, run it, it asks me for the hostname of my printer and I'm up and running.
  
  11 replies →
guard-of-terra 10 years ago

In my experience printing in Linux has been pretty solid for at least the last 5 years or so. CUPS is CUPS, drivers are plentiful and one apt-get away, UIs get better.
Of course I'm limited to occassional document or a few tickets.

carapace 10 years ago

Stuff like this is why I find "Synthetic Biology" so fucking scary.

Terr_ 10 years ago

Don't worry, the natural stuff is probably just as messy, or worse.
If it helps, remember that grey-goo already took over the world, and you were born part of it.

t0mek 10 years ago

During my studies I had a course called "Advanced Network Administration". I learnt about the OSPF routing protocol and its Quagga [1] implementation and I had to prepare a simple installation that consisted of 3 Linux machines. They were connected with cheap USB network adapters.

After everything was configured I started the Quagga daemons and somehow they just didn't want to talk to each other. I've opened tcpdump to see what happens and the OSPF packets were exchanged properly. After a while the communication and routing was established. I thought that maybe the services just needed some time to discover the topology.

I've restarted the system to see if it's able to get up automatically, but the problem reoccured - daemons just didn't see each other. Again, I launched tcpdump, tweaked some settings and now it worked - until it didn't a few minutes later.

It take me a long time to find out that diagnostic tool I've used had actually changed the observed infrastructure (like in the quantum world). tcpdump enables the promiscuous mode on the network interfaces and apparently this was required for Quagga to run on the cheap USB ethernet adapters. I've used the ifconfig promisc and after that the OSPF worked stable.

[1] http://www.nongnu.org/quagga/

pif 10 years ago

CERN: LEP data confirm train time tables http://cds.cern.ch/record/1726241

CERN: Is the moon full? Just ask the LHC operators http://www.quantumdiaries.org/2012/06/07/is-the-moon-full-ju...

BrandonM 10 years ago

Near the end of that post, the commenter suggested a fix that includes the most qualified Useless Use of Cat entry[0] that I've ever seen!

   cat | sed ... > $INPUT_TEMP

[0] http://porkmail.org/era/unix/award.html#cat

chris_wot 10 years ago

Wait till you see where they found the print server!

http://www.informationweek.com/server-54-where-are-you/d/d-i...?

gchadwick 10 years ago

Surely the real bug is the reliance on the 'file' utility in the first place? It attempts to quickly identify a file that could be literaly anything so it's not surprising (and indeed should be expected) that sometimes it gets it wrong.

I don't know the details of the CUPS script but presumably it can only deal with a small number of different file types. Implementing it's own detection to positively identify PS vs whatever other formats it deals with vs everything else would be far more robust.

krylon 10 years ago

One of our users complained that she could no longer print PDF documents. Everything else, Word, Excel, graphics, worked fine, but when she printed a PDF ... the printer did emit a page that - layout-wise - pretty much looked like it was supposed to, except all the text was complete and utter nonsense.

Or was it? I took one of the pages back to my desk, and later in the day I had an idle moment, and my eyes wandered across the page. The funny thing is, if I had not known what text was supposed to be on the page, I would not have noticed, but the text was not random at all. Instead, all the letters had been shifted by one place in the alphabet (i.e. "ABCD" became "BCDE").

I went back to the user and told her to check the little box that said "Print text as graphics" in the PDF viewers printing dialog, and voila - the page came out of the printer looking the way it was supposed to.

Printing that way did take longer than usual (a lot longer), but at least the results were correct.

To this day, I have no clue where the problem came from, and unfortunately, I did not have the time to investigate the issue further. I had never seen such a problem before or after.

In a way it's part of what I like about my job: These weird problems that seem to come out of nowhere for no apparent reason, and that just as often disappear back into the void before I really understand what is going on. It can be oh-so frustrating at times, but I cannot deny that I am totally into weird things, so some part of me really enjoyed the whole experience.

mark-r 10 years ago

I love the modification that pipes the output of cat into sed; doesn't he realize that cat is redundant at that point?

adzm 10 years ago

I'll admit to being a superfluous cat user. I just like typing cat.

kinai 10 years ago

I once had the case with a desktop system that when you sat down and started typing it often hardware reseted. Turned out Dell left some metal piece in the case which was hanging between the case and the motherboard (in those few millimeter) and with some stronger desk vibration caused a shortcut.

gsylvie 10 years ago

Here's a great collection of classic bug reports (including the never-printing-on-tuesdays): https://news.ycombinator.com/item?id=10309401

DonHopkins 10 years ago

My 6502 based FORTH systems would sometimes crash for no apparent reason after I tweaked some code and recompiled it. Whenever it got into crashy mode, it would crash in a completely different way, on a randomly different word. I'd put some debugging code in to diagnose the problem, and it would either disappear or move to another word! It was an infuriating Heizenbug!

It turns out that the 6502 has a bug [1] that when you do an indirect JMP ($xxFF) through a two byte address that straddles a page boundary, it would wrap around to the first byte of the same page instead of incrementing the high half of the address to get the first byte of the next page.

And of course the way that an indirect threaded FORTH system works is that each word has a "code field address" that the FORTH inner loop jumps through indirectly. So if a word's CFA just happened to straddle a page boundary, that word would crash!

6502 FORTH systems typically implemented the NEXT indirect threaded code inner interpreter efficiently by using self modifying code that patched an indirect JMP instruction on page zero whose operand was the W code field pointer. [2]

JMP indirect is a relatively rare instruction, and it's quite rare that it's triggered by normal static code (since you can usually catch the problem during testing), but self modifying code has a 1/256 chance of triggering it!

A later version of the 65C02 fixed that bug. It could manifest in either compiled FORTH code, or the assembly kernel. The FIG FORTH compiler [3] worked around it at compile time by allocating an extra byte before defining a new word if its CFA would straddle a page boundary. I defined an assembler macro for compiling words in the kernel that automatically padded in the special case, but the original 6502 FIG FORTH kernel had to be "checked and altered on any alteration" manually.

[1] http://everything2.com/title/6502+indirect+JMP+bug

[2] http://forum.6502.org/viewtopic.php?t=1619

"I'm sure some of you noticed my code will break if the bytes of the word addressed by IP straddle a page boundary, but luckily that's a direct parallel to the NMOS 6502's buggy JMP-Indirect instruction. An effective solution can be found in Fig-Forth 6502, available in the "Monitors, Assemblers, and Interpreters" section here. (The issue is dealt with at compile time; there is no run-time cost. The word CREATE pre-pads the dictionary with an unused byte in the rare cases when the word about to be CREATEd would otherwise end up with a code-field straddling a page boundary.)"

[3] http://www.dwheeler.com/6502/FIG6502.ASM

    ;    The following offset adjusts all code fields to avoid an
    ;    address ending $XXFF. This must be checked and altered on
    ;    any alteration , for the indirect jump at W-1 to operate !
    ;
              .ORIGIN *+2

[...]

              .WORD DP       ;)
              .WORD CAT      ;| 6502 only. The code field
              .WORD CLIT     ;| must not straddle page
              .BYTE $FD      ;| boundaries
              .WORD EQUAL    ;|
              .WORD ALLOT    ;)

nickpsecurity 10 years ago

I vote this as the best heisenbug I've read so far. That's sounds like such a nightmare to debug. I might have never found it due to throwing the thing in the trash and buying a different machine. Forth is easy to port after all. :)
buserror 10 years ago

Funny, I remember that bug very well, it was used on the earlier apple IIs, sometime on purpose (!) (mostly by game protections). This was fixed on the 65c02.

rcthompson 10 years ago

I once found a bug in a weather applet that only occurred when the temperature exceeded 100 degress. The 3-digit temperature caused a cascade of formatting issues that rendered part of the applet unreadable. I believe the author used Celsius, and so would never have encountered this bug on their own.

jpatokal 10 years ago

Relevant: https://gyrovague.com/2015/07/29/crashes-only-on-wednesdays/

GigabyteCoin 10 years ago

"tue" means "kill" in french... I wonder if a french programmer somewhere had something to do with this?

lifeisstillgood 10 years ago

And this is why we won't ever get AI. Humans seem to only manage to get to a certain level of complex before it all gets too much.

There are supposedly people in Boeing who understand literally every part of a 747, the wiring and the funny holes in the windows. But there is probably no one who understands all parts of Windows 10.

We're doomed to keep leaping like dolphins to reach a fish held too high by a sadistic Orlando world trainer

21 10 years ago

People die all the time from "bugs" in the genes, or from acquired defects. This doesn't stop the rest of the people and the race going on.
Same thing with AI. Some will crash and burn. Others will manage to continue.
Nothing is eternal. We only need stuff to work for a while for it to be useful.
dilemma 10 years ago
As I've said and been downvoted for before, there will never be Artificial Intelligence because a machine merely does what its program tells it. There will be plenty of Artificial Stupidity, however, as evidenced by this thread.
- 21 10 years ago
  
  And you do what the neurons in your brain make you do.
  Article from today - http://www.theatlantic.com/magazine/archive/2016/06/theres-n...
  
  3 replies →
- sspiff 10 years ago
  
  Off topic and no arguments to support your case, sounds like downvote material to me.

gregschlom 10 years ago

So what's the lesson here? What should we learn from that?

kaizensoze 10 years ago

Make sure you escape spaces? Or: https://news.ycombinator.com/item?id=11718716
anonymoose123 10 years ago

Lesson is: Not every post here should teach a lesson?

broodbucket 10 years ago

Is it just me or does this get posted every month?

jon-wood 10 years ago
I've never seen it before to my recollection, and I spend a possibly unhealthy amount of time checking up on the stories here, so my guess is no it doesn't.
- broodbucket 10 years ago
  
  It was a rhetorical question that was incorrect. Every month is a little bit too frequent, but not by much...
  https://hn.algolia.com/?query=openoffice%20tuesday
  Probably because it gets mentioned often in addition to be reposted often.

sklogic 10 years ago

No, it is a cups bug indeed. File was never guaranteed to be precise in the first place, it is not a good idea to rely on it.

meeper16 10 years ago

Yet another reason I don't let OpenOffice or any Linux UIs slow me down. It's all about the command line and always will be.

Spivak 10 years ago

> So it's not a problem w/ openoffice.org, cups, or the brother printer drivers. It is a bug in the `file` utility, and documented at https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619.
This bug had nothing to do with OO. This was just a cute manifestation of the actual bug.

annebass 10 years ago

Fascinating bug :)