Microsoft Has Manually Patched Their Equation Editor Executable

9 years ago (0patch.blogspot.com)

164 comments

dielel

Notice the xchg, stosb and a loop instruction. This was definitely written by a skilled Asm programmer --- I've never seen even a compiler at -Os generate code like that.

This also compels me to "code-golf" the function even more:

     push edi
     mov edi, [esp+8]
     mov ecx, [esp+12]
     jecxz label2
    label1:
     push ecx
     call sub_416352
     stosb
     pop ecx
     test al, al
     loopnz label1
     jecxz label2
     dec edi
     salc
     stosb
    label2:
     pop edi
     ret

Original: 58 bytes; patched: 44; mine: 30.

I've done plenty of patching like this, and indeed the relative "sparseness" of compiler output very often allows the more functional version to be smaller than the original. It's amazing how many instructions the original wastes --- notice how none of ebx, esi, or edi are used, yet they get needlessly pushed and popped; and despite saving those registers so they could be used locally, the compiler perplexingly decided to keep all the local variables on the stack instead. The "jump around a jump", with both of them being the "long" form (for destinations greater than 128 bytes away, not the case here) is equally horrible. This may actually be a case where today's compilers will generate smaller code for the same source.

Note that in 32-bit code, memcpy is typically implemented by first copying blocks of 4 bytes using the movsd (move double word) instruction, while any remaining bytes are then copied using movsb (move byte). This is efficient in terms of performance, but whoever was patching this noticed that some space can be freed by only using movsb, and perhaps sacrificing a nanosecond or two.

On older processors this was true, but since Ivy Bridge a REP MOVSB will essentially be as fast but smaller. Look up "enhanced REP MOVSB" for more information.

tpolzer 9 years ago
Modern compilers aren't that much better at code golf.
I tried equivalent C code and gcc-7.2 gets me 47 bytes, while clang-6.0 only manages 49 bytes (both with -m32 f.c -Os -fomit-frame-pointer).
I have a feeling that size optimization just isn't really important to (at least open source) compiler writers these days. There are more important things, like actual performance, standards compliance and nice diagnostics.
- derefr 9 years ago
  
  It’s intriguing to me that this is so, given that most of the point of writing native code these days (rather than targeting some managed-native system like the CLR) is to optimize hot loops, and one of the best optimizations for hot loops is to get them to fit entirely into a cache line. Do compilers for C/FORTRAN/etc. not have any mode or pragma to indicate that you’re attempting to get this performance benefit from a given function-and-its-dependencies?
- earenndil 9 years ago
  
  With clang, try passing -Oz?
  
  1 reply →
azag0 9 years ago
How does this go with the often quoted mantra that you can only beat compilers today if you're an extremely skilled asm programmer? Or is the problem you describe just about executable size rather than speed?
- simias 9 years ago
  
  Optimizing for size is easier because you only have exactly one metric to consider: how many bytes your instructions take.
  When optimizing for speed you have to consider many factors like the relative speed of each instruction, cache behavior (including size of the cachelines, associativity, number of layers, relative speed of the layers...), pipelining, branch prediction, prefetching, whether moving your data to SIMD registers could be worth it, what to inline and what not to inline, what to unroll and what not to unroll, constraint solving to optimize things that can be computed or asserted statically etc...
  
  10 replies →
- pjc50 9 years ago
  
  Well, the code wasn't compiled by today's compiler, it was compiled in late 2000. Visual Studio 6 maybe?
  Even today compilers tend not to optimise the function preamble/postamble away. I'm only half in agreement with the mantra: you probably can beat the compiler, but is it worth it?
  There are a few situations where it's genuinely a good idea to write in assembler to be explicit about predictable behaviour. Short security-critical constant-time functions are a good candidate.
- dragontamer 9 years ago
  
  There are a lot of assembly language instructions that do slightly different things than standard C++ or C, but if the programmer is aware of them they can "handle" the differences.
  For example, the xchg instruction doesn't have any C equivalent. (although it has a C++ equivalent: std::swap) The programmer may see:
  A ^= B; B ^= A; A ^= B
  These two are swapped. A C compiler may be smart enough to know this is an xchg instruction, or it might turn them into xors. Hard to say, really.
  ---------------
  Most of the low hanging fruit have been taken up for sure. Almost every "memcpy" turns into "rep stos" for example (which is the assembly-language equivalent to memcpy).
  A high-level programmer may not know that "memcpy" turns into "rep stos" however, and may emit his own memory copying for-loop.
  At very least, a good optimizing C / C++ programmer needs to know about these little things. They'll let the compiler turn "memcpy" into "rep stos" (for -Os) or AVX memory store instructions respectively instead of writing their own less efficient loops on the matter.
- Retr0spectrum 9 years ago
  
  Optimising for size is a relatively "obvious" goal, although it still takes a lot of skill to do it well. Optimising for speed is much less obvious however, the x86 architecture is incredibly complex when it comes to working out what code will be faster.
- hrydgard 9 years ago
  
  Well, it should also be noted that the responsible compiler in this case is at least 17 years old.
- com2kid 9 years ago
  
  > How does this go with the often quoted mantra that you can only beat compilers today if you're an extremely skilled asm programmer? Or is the problem you describe just about executable size rather than speed?
  Word 2000, so a 17+ year old compiler. Compilers have gotten a lot better since then.
  Having worked on a compiler team back in the mid 2000s, even then I'd say it was easy for almost anyone to spot areas where a human could optimize more.
  Now days, much less so.
- xenadu02 9 years ago
  
  This is a case of knowing the rules so well that you know when you can break them.
  Its also a historical artifact from the days when many programmers wrote assembly yet compilers started getting good.
  There’s also an element of avoiding premature optimization: don’t assume the compiler will product slower code or that if it does it will matter in your specific application.
  At the very least you should give the compiler a chance, profile, then hand-tune after you’ve fixed all the low-hanging fruit.
- bitexploder 9 years ago
  
  That mantra applies to "most" programmers.
  I think he was talking mostly about size.
  Odds are good most programmers tinkering in machine code won't beat the performance of the compiler. That takes experience. It is a good rule of thumb.
  I think it is easier to write smaller (size) code than a compiler, but when you measure performance it will beat you often until you get good. Alignment, x86 tricks... It takes a bit of knowledge to do well.
- acdha 9 years ago
  
  simias has most of it but note also that that file appears to have last been compiled in the early 2000s. Compilers of that era were far less advanced, especially since many large companies were pretty conservative about the optimizations enabled (fixing a bug meant mailing CDs for many customers).
  The general trend is that it's been getting harder and harder to do that easily, which means people want to be more focused — something like OpenSSL can still justify hand-tuned assembly for various processor families because it's a widespread hotspot but as compilers continue to improve the number of places where it's worth the maintenance cost is going to keep shrinking.
  In the early 2000s, the scientific HPC programmers I worked with were careful to maintain a portable C implementation which they could use as a check both for correctness and for an optimization baseline — it wasn't uncommon for a new compiler and/or processor to substantially close the gap relative to a lot of hard manual work.
feelin_googley 9 years ago
"Note that in 32-bit code, memcpy is typically implemented by first copying blocks of 4 bytes using the movsd (move double word) instruction, while any remaining bytes are then copied using movsb (move byte)."
Some software authors do not use memcpy().
https://marc.info/?l=djbdns&m=96477313901746&w=2
http://cr.yp.to/lib/byte.html
#include "byte.h" void byte_copy(to,n,from) register char *to; register unsigned int n; register char *from; { for (;;) { if (!n) return; *to++ = *from++; --n; if (!n) return; *to++ = *from++; --n; if (!n) return; *to++ = *from++; --n; if (!n) return; *to++ = *from++; --n; } }
yuhong 9 years ago

In fact, REP MOVSD is harder to handle in hardware especially for unaligned locations because you can only interrupt on 4 byte boundries.

infinity0 9 years ago

Binary hacking FTW.

The semi-official Debian server, alioth.debian.org, where a lot of random developer stuff is hosted, is stuck on Debian wheezy for various reasons. Most users, including myself (a Debian Developer) don't have root access to upgrade the server nor install new software.

The version of libapt-inst is too old to support Debian packages with control.tar.xz members (only control.tar.gz members). So we can't upload newer Debian packages to various custom APT repos that we host on that server.

I worked around this by looking at the libapt-inst source code, figuring out how to make it support control.tar.xz instead of control.tar.gz, and binary-patched libapt-inst.so to have this effect instead. It's actually fairly simple

1. there is a check for control.tar.gz, the failure branch prints an error and then returns. I overwrite this with NOP so it goes into the "success" branch.

2. then later it extracts the control.tar.gz member and pipes it through gzip. Luckily, nowhere else in the program uses the exact string "control.tar.gz" or "gzip" so I simply patch that string "control.tar.gz" -> "control.tar.xz" in the binary and also change "gzip" -> "xz\0\0".

(Actually given the change in (2), (1) is not necessary. But without it you get a bunch of spurious error messages.)

Applying this patch makes the resulting .so lose the ability of working with old control.tar.gz members (which is still needed of course). So my workaround does this:

LD_PRELOAD=libapt-inst.so.patched apt-ftparchive [..] && apt-ftparchive [..]

i.e. runs it once with the hack to pick up the new-style debs, and once again without the hack to pick up the old-style debs.

My motto is, "dirty solutions for dirty problems". :D :D :D

mschuster91 9 years ago
> The semi-official Debian server, alioth.debian.org, where a lot of random developer stuff is hosted, is stuck on Debian wheezy for various reasons
Jeez. How is security maintained? That actually scares me a bit.
- mort96 9 years ago
  
  Wheezy is supported until the end of May 2018, so it still gets security patches.
uyoakaoma 9 years ago

dirty solutions for dirty problems :):)

rogerhoward 9 years ago

I'm surprised no one has noted the copyright is to Design Science - this is a small company in my hometown who are still around. I've spoken with their CEO a few times and I wouldn't be at all surprised if the source code was lost, or somehow at least wasn't being made available to Microsoft (I doubt it ever was). It's a really old school shop who seems to have largely been coasting on the licensing of this one component for the past couple decades and I wouldn't at all be shocked to find they no longer are capable of maintaining it themselves.

extra88 9 years ago

Design Science still develops and sells MathType, the "pro" version of the Equation Editor licensed to Microsoft.
They also make other software meant to make math more accessible to people with various disabilities.
https://www.dessci.com/en/
rob74 9 years ago
I noted it - thanks for the background info on the company! I also assume that either they are not able to maintain the software themselves, or they have lost the source code, but it might also be that setting up the toolchain to compile such an old piece of software is more effort than just patching the binary.
- sjburt 9 years ago
  
  This is such an underappreciated aspect of code stewardship. There are powerful tools for source control and archiving. But ensuring that state of code could actually be built at an arbitrary date in the future is so much less assured.
- londons_explore 9 years ago
  
  I agree here.
  I would guess the build environment involves lots of dependencies, lots of special config, lots of stuff which has to be the exact correct version, and all that knowledge has been lost as people have left the team and it wasn't properly documented.
  Sure, you could spend a couple of weeks setting up a suitable environment again and relearning everything from scratch, but binary patching is probably easier.
yuhong 9 years ago

According to an Ars comment: "I've got an older version of Mac Office (2011 I think), and there's a version of Equation Editor in there with a 1990-2010 design science copyright on it, so they have some version of newer code they could swap the old office one for."

dzdt 9 years ago

I once worked at a place which lost part of the source code for their giant mission-defining application. They spent a decade linking in object code for which there was no corresponding source code.

The build team was very proud when they announced that the application would finally start being built from the source code in version control.

Stuff happens!

bartread 9 years ago
Stuff happens, indeed, and more often than most of us realise.
Getting on for a decade ago now I was working at Red Gate when they bought .NET Reflector - a decompiler for .NET code - from Lutz Roeder. After the acquisition we started asking people what they were using it for.
Turns out a significant minority of them were trying to recover lost source code, or source code they never had in the first place (e.g., where a supplier went out of business). I don't remember the exact figure but it might even have run into low double-digit percentages. Bear in mind this is a tool that was being downloaded tens of thousands of times every month by all manner of people working for all kinds of organisations of every size and you can see the scale of the problem.
There were a couple of Reflector add-ins that would allow you to take a .NET binary and generate a C# or VB.NET Visual Studio project with all source code from it. The source code was never perfect and wouldn't likely compile first time, but it was certainly better than starting from scratch. Not surprisingly these add-ins were among the most popular.
Granted, times have changed, and I think source control is probably the default for almost everyone these days - although I would have expected that even in 2008 - but, bottom line: I think this sort of thing happens a lot, for one reason or another.
- giancarlostoro 9 years ago
  
  Heh can say I know some guys who have done this, and I myself have done this (and with similar but open source tools) there's also the "what is this sketchy .NET app really doing" moment where you want to know it's not doing anything "funny" to your system and you peek at the code.
- Shank 9 years ago
  
  I've done a significant amount of work with decompiling and rebuilding executables of crazy levels of complexity. It's definitely time consuming -- but not as bad as you'd think. Maybe 1-4 weeks with a dedicated team working through it and testing functionality. Definitely a viable solution if you've lost source code.
- TomMarius 9 years ago
  
  Yeah we used ILSpy for exactly this reason - a vendor went out of business and our client needed patches REAL QUICK.
skissane 9 years ago

A friend once told me a story about a software company that had offices in the World Trade Center in New York. Their offices were totally destroyed by 9/11; thankfully, all the staff got out alive, but it turned out they didn't have offsite backups of the source code repository, and it was lost completely. They found various bits of the source code floating around (e.g. some developers had bits of it on their home computers), but there were a few key components they could not locate any source for. Well, the customers still had the compiled binaries, so they got the binaries back from the customers, extracted the missing bits, ran them through a decompiler, and checked the result into the source code repository – since the application was written in Java, this actually worked quite well. Years later, new developers would find bits of obviously decompiled code still in the source repo (you can tell, it has a distinct look to it, e.g. variable names with numbers in them), and scratch their heads, and then get told the tale.
Fuxy 9 years ago
Stuff may happen but I would put replacing the module I lost to the source code to pretty high on the list of things that need fixing.
There is no way I could rely on something like that.
Am I the only one that thinks relying on something that has no source code is just asking for trouble and headaches in the future?
- tragomaskhalos 9 years ago
  
  Then there is the situation where, one day, your source code control system tells you there are no files .... because someone in the org you are working at figured noone was using the server and wiped it. This really happened on a job I was on. Luckily we still had source dotted around a dozen or so machines, at various levels of up-to-datedness, so it was possible (with a load of scripting code generating some helpful timeline pix) to forensically reconstruct not just the tip but a decent portion of the history too. Fun times.
- bartread 9 years ago
  
  No, you're not the only one. But, at the same time, it isn't always the absolute top priority. Right now I'm working on a couple of systems that have had a few mystery DLLs in them. Not necessarily anything lacking source code - it's around "somewhere" - but certainly things we've lacked the immediate capability to rebuild.
  But those DLLs aren't the only, or the biggest, problems with the systems. Hence finding the source and building them from it isn't necessarily the top priority, although we are progressively doing exactly this.
bearbearbear 9 years ago
> They spent a decade linking in object code for which there was no corresponding source code.
How would you go an entire decade without noticing this?
Wouldn't you have to use that missing source code for something within ten years?
- dzdt 9 years ago
  
  I only know details at the level of war story, secondhand. I may have overstated a bit. For sure there was a build process relinking in old object code for many years which no one knew how to reproduce. Possibly there was still associated source code, but the object bits had been declared "golden" and no one knew exactly what source version or build process had produced those "golden" bits.
  
  1 reply →
- pjc50 9 years ago
  
  Oh, undoubtedly it was noticed, but an enterprise software company has a tremendous capacity to procrastinate on fixing things.
- gvb 9 years ago
  
  Likely the source was used to build an object library which then got linked in to form the executable. If the library Just Worked, there would be no reason to rebuild it.

nicktelford 9 years ago

There's only two reasons I can think of why they'd patch the binary directly: either they've lost the source-code, or they no longer have an environment they can build it in.

dtech 9 years ago
Another reason could be that it has dependencies that link to specific addresses in the exe. It's very peculiar that they made the effort to keep all the original adresses.
- nostoc 9 years ago
  
  It's not an effort, it's probably a side effect of simply patching the binary.
  If you can't rebuild it, you have to manually edit it, and when you do that you can't really change the addresses, not without a lot of headache.
- ajnin 9 years ago
  
  It's keeping the adresses that requires the least effort, changing them would have implied to also change all calls and jumps in the rest of the code, which could mean a very large number of addresses to change.
- breakingcups 9 years ago
  
  That's a really good point.
- bearbearbear 9 years ago
  
  The most likely reason is there's a heavy handed government agency that relies on all the base addresses to be the same, who buys lots of Microsoft licenses.
  
  1 reply →
leoc 9 years ago
For ages Alan Kay has been claiming to know that MS has lost part of the Word codebase.
- leeter 9 years ago
  
  In this case they may never have had the source, the copyright is for Design Science, Inc which makes MathType a popular equation editor for Word. Depending on how this was licensed this may have even technically been illegal (due to copyright issues); although I suspect MS has a license that allows this.
- noblethrasher 9 years ago
  
  I've never heard him claim that MS actually lost the source code to Word, only that they couldn't find the cause of a decades-old bug (related to selection and text-justification as I recall), even though it had thousands of bug reports, and that they spent years looking for it before giving up.
  The point was that the code is too big and complicated, even for some of the best engineers in the world.
  
  1 reply →
- nimish 9 years ago
  
  Microsoft has lost tons of code over the years. Even with the source, refactoring office which has people using file formats that are binary dumps of memory, is not trivial.
  
  7 replies →
- CamperBob2 9 years ago
  
  There'd be nothing extraordinary about that, either. I've never worked for a company that could build something today that they released twenty years ago.
  
  1 reply →
- khstangherlin 9 years ago
  
  This do make a lot of sense. I don't think there is exe calls directly linked from other apps. If that were the case they would just update the address in the calling app. Much easier. Anyway, very nice piece of work. :)
ksk 9 years ago

One other reason might be that they licensed part of the code from a third party and are no longer allowed to re-distribute it. I suppose patching out a few bytes might get them out of legal jail.

dawnbreez 9 years ago

While this does suggest that they lost the source code for this program, it also shows an unbelievable amount of skill.

porfirium 9 years ago
HN, where writing assembly shows an unbelievable amount of skill.
- pjmlp 9 years ago
  
  Well, when you have people calling Electron based apps native.....
- dawnbreez 9 years ago
  
  Not just writing assembly, rewriting a compiled object file without letting any of the addresses change, without having the source to work with, and presumably with almost no documentation, to patch a program that has been left untouched for almost 20 years.
  
  6 replies →
- pmelendez 9 years ago
  
  It is indeed a lost art. I can count with just one hand the amount of colleagues that I know that are capable of doing this. Also this is not assembly, it is object code.
  
  11 replies →

lzybkr 9 years ago

I have no specific insight to this patch, but I do have personal experience binary patching a popular Microsoft product.

My patch was to the VC++ compiler nearly 20 years ago. We had source, and my fix was also applied to the source (which I'd imagine is still there today), but a binary patch also made sense in the short term.

The binary that I patched was used to build another important Microsoft product, and this bug was found late in the product cycle where any compiler change was risky.

We weren't 100% confident we had the exact sources used to build that version of the compiler (git would have been handy then), we only knew, plus or minus one day, what the sources where.

After carefully evaluating the binary patch versus the risk of building from uncertain source, the binary patch was taken to reduce risk.

I'm no reverse engineer, but this was a pretty interesting exercise in RE even though I had sources. I had no symbols, and the binary was optimized so that functions were not contiguous, cold paths were moved to the end of the binary. Just finding the code I needed to patch was not easy.

The code review was fun - a dozen or so compiler engineers reviewed the change on paper printouts - the most thorough review I've had in my career, and the only one that used paper.

To the best of my knowledge, this binary was never used to build anything other than that specific version of the product which I won't name - not that it matters really, the product is still in use, but that version is unlikely to be in use anywhere anymore.

dielel 9 years ago

Thanks for sharing this. I suspected that "not being sure if you have the exactly right source code" could be a real world reason to patch a binary, and now I know.

dtech 9 years ago

That is both pretty impressive and horrific.

I wonder if they patched this way because they wanted to maintain as much binary compatibility as possible, or if they don't have the original source/couldn't reproduce the build process.

gizmo 9 years ago
Horrific? This is what you do when you want to make sure you don't introduce any unintentional changes. Computers aren't magic, and there is nothing wrong about patching a binary.
Compiling the software with a modern compiler or linking to a modern runtime is very likely to bring obscure bugs in the codebase to the surface. It's pretty hard to replicate the entire build process that produced the original binary, even if they have the source code and everything else on hand.
- dtech 9 years ago
  
  > Horrific? This is what you do when you want to make sure you don't introduce any unintentional changes.
  Horrific, because the average programmer would consider patching the binary a worst-case scenario.
  > there is nothing wrong about patching a binary
  I would only trust a skilled assembly programmer to do this task without creating other problems, and most businesses don't have those on retainer.
  
  2 replies →
- userbinator 9 years ago
  
  ...and sometimes, compiling and linking takes even longer than just opening the binary with a hex editor and changing the right bytes. I've done this a few times with things like string constants that were slightly off, although after testing the binary, I change the source too.
- phkahler 9 years ago
  
  >> It's pretty hard to replicate the entire build process that produced the original binary, even if they have the source code and everything else on hand.
  I seem to recall Microsoft rebuilds everything from source daily (or weekly?). This is a common practice in companies with a large code base. "Nightly Builds" even come from Mozilla. The reason is simple - if you wait until it's time to release, you're gonna have problems. Even if a developer tests his changes locally, they need to be integrated in with everything else and retested.
  If you're really serious about things you will have your tool chain under revision control. As a result you can reproduce binaries to the byte from source code, with the possible exception of embedded dates pulled in at compile time. This is actually a good fallback when someone finds a critical problem in old code - you start by finding the code and then verify by reproducing the affected binary from source. This is not technically hard, it requires discipline and best practices.
- cesarb 9 years ago
  
  Patching the binary manually means that now the source code and the binary are out of sync, and the changes will be lost if it is ever compiled from the source code again.
- andrewchambers 9 years ago
  
  Its funny how very few people do fully self contained and reproducible builds. At least nixos tries hard.
- jononor 9 years ago
  
  Your QA process, including automates tests should ensure there were no unacceptable changes introduced. Otherwise you ability to create and ship fixes in timely manner will be severely harmed by the fear of breaking things.
  
  2 replies →
moonbug22 9 years ago

Building a new binary means running a full QA against it, which is probably not cost-effective for such an old component. In contrast, this patch has exactly known impact. I know it seems like magic to you lot, but it's a day's work if you've the right skills.
frik 9 years ago

They probably don't have the Office 97 or 2000 build pipeline around anymore. And back then for Office XP or 2003 copied the equation editor in binary form to the new repository.

be5invis 9 years ago

You have to know that the MSFT may not have the source code of Equation Editor, since it is a simplified version of MathType.

Someone 9 years ago

This ‘old’ equation editor is a limited version of MathType (https://en.wikipedia.org/wiki/MathType#Microsoft_Equation_Ed...) that has been supplanted by a built-in equation editor.

Chances are that Microsoft doesn’t have a license for bug fixes from Design Science (makers of MathType) anymore and isn’t willing to pay for this fix.

Alternatively, Design Science may not be able to deliver a version that, for maximum backwards compatibility, has only this fix (to minimize risks, they would have to have kept an environment around that hosts the compiler used back then)

dmitriid 9 years ago

One reason for doing it this way is possibly this:

> Well, have you ever met a C/C++ compiler that would put all functions in a 500+ KB executable on exactly the same address in the module after rebuilding a modified source code, especially when these modifications changed the amount of code in several functions?

It's quite possible they are still contractually obligated to maintain some pretty old systems where changes to the .exe would produce unexpected behaviour. I had Access apps/databases crash on a system if they were built by a different version of Access.

magnat 9 years ago

Slightly off-topic: what program is used to produce disassembly graphs as those in article?

kristofferR 9 years ago
https://www.hex-rays.com/products/ida/
IDA is widely regarded as the best disassembler and debugger out there. It comes with a price to match too though.
- erikbye 9 years ago
  
  If anyone wants to give it a go they can use v5.0, which is free for non-commercial use.
  Otherwise: https://reverseengineering.stackexchange.com/questions/1817/...
- pjc50 9 years ago
  
  It is something of a rite of passage in the piracy community to crack your own copy of IDA Pro.
  It's also a rite of passage to distribute cracked and boobytrapped copies on filesharing sites...
  
  2 replies →
sandos 9 years ago

x64dbg can also produce nice graphs and is open-source!
nostoc 9 years ago

radare2 is another open source alternative, but it comes with quite a learning curve.
http://rada.re/r/

_pmf_ 9 years ago

Ah, this brings up a lot of font memories of me in high school preparing presentations using this fine piece of software[0] before replacing it with a 1GB open source equation editor called LaTeX.

[0] It was actually quite usable once you got to know its warts.

yoz-y 9 years ago

The article mentions that the timestamp of compilation gets embedded into the binary. When does this happen? I am used to having identical binaries when recompiling the same source code with same flags (and compiler and so on and so on)

svenfaw 9 years ago
What compiler do you use? Almost all of them embed a compilation timestamp (which is one of the reasons reproducible builds are often a challenge).
- yoz-y 9 years ago
  
  Mainly visual compiler and clang.

lunixbochs 9 years ago

Binary patching is a really common requirement in attack/defense CTF, and there are a few projects floating around to help with it.

Keypatch helps you do assembly overwrites in IDA Pro.

Binary Ninja lets you do assembly (and C shellcode!) overwrite patches, and even has undo.

I have my own project [1] for patching ELFs that relies on injecting additional segments and injecting a hook at any address, so as to not require in-place patches. It can also massage GCC/Clang output and inject that reliably into an existing binary.

[1] https://github.com/lunixbochs/patchkit

I have my own story about this as well. A few years ago I released a port of Uplink: Hacker Elite for the OpenPandora handheld with a few game engine patches, and some people were running into a bug: the game would enter the "new game" screen on every launch, even if you already had a save game to load.

I and couldn't find the exact source I'd used to build it and didn't want to spend time making sure I got all of my bugfixes into the vanilla repository, so... I went digging with IDA, found the topmost branch to the "new game" wizard, and patched the address to go to the main menu function instead. At that point you could still click "new game" from the menu and it wouldn't go through the patched address (so "new game" still worked), but you could also load an existing game, thus fixing the bug!

I still have nothing on Notaz, who statically recompiled StarCraft and Diablo for that community :)

alexeiz 9 years ago

It's an old program the source code for which may either not compile with the modern C++ compiler, or be lost. Back in 2000, Microsoft was using Visual Source Safe for managing its source code. I wouldn't be surprised if nobody can remember where the heck the VSS repository with that source code is located.

That leaves the binary monkey-patching as the only reasonable solution. I'm pretty sure Raymond Chen still works at Microsoft...

ajross 9 years ago

Binary patching is really only reasonable when the source code is indeed lost. If they had the code but simply needed a compiler that worked, they could have rebuilt it using the same toolchain and build environment it was built with to begin with. Old versions of Windows and MSVC are obviously still around.

jws 9 years ago

Just a historical note: Patching used to be much more common. Back in the Vax VMS days the image file format (executables, not pictures) had a section for patches.

From the ANALYZE/IMAGE command…

Patch information --- Indicates whether the image has been patched (changed without having been recompiled or reassembled and relinked). If a patch is present, the actual patch code can be displayed. (VAX and Alpha only.)

lima 9 years ago

And here I am, manually patching Docker containers...

atupis 9 years ago
Just why?
- dullgiulio 9 years ago
  
  Because they are immutable /s

foobarbecue 9 years ago

I suppose the fact that they have patched the binary means they can never again patch the source?

artursapek 9 years ago
They'd have to re-implement the patch in source before doing anything else to it. I wonder if they are no longer able to build from source anymore... why else would they resort to this?
- foobarbecue 9 years ago
  
  As explained in the article and in other comments, it's possible that there are dependencies that rely on address continuity of contents or file size continuity.

anon1253 9 years ago

They probably just lost the ability to build it, or the source code can't be found. Happens quite often. 17 years is a /long/ time to maintain build systems and remember where you put the files.

nathan_f77 9 years ago
I'm glad that most of the software development community seems to have settled on git. I get the feeling that I'll still have all of the source code for my projects in 20 years.
Redundant backups are especially important for software companies. It's scary to think how many startups give all cofounders and developers admin access to everything. It helps that git is distributed, but it's not hard to imagine a scenario where a ticked off former employee wipes everyone's laptops and deletes the hosted source code.
Even if you don't update the mirrors regularly, it's good to know that you have some copies of data in BitBucket/GitLab/Heroku/Google Drive.
- Merad 9 years ago
  
  I don't know if I would hold your breath. 10 years ago I think most people hadn't even heard of git (it was ~2 years old) and Google Code was the hot new thing, and GitHub was a year or so away from creation. At the time most people seemed to be pretty content Subversion and hosting on Sourceforge (before it turned evil) or Google Code, but in the next ~5 years everything changed. Granted git, GitHub, etc. have far more momentum that anything that came before, but this is a field where it feels like the only constant is change.

alkonaut 9 years ago

If the thing is 17 years old an a replacement has existed since forever, what purpose does this file have today (Assuming I'm on a modern windows, I run either no ms office or a modern office version).

whatthesmack 9 years ago
From the article:
> While Office has had a new Equation Editor integrated since at least version 2007, Microsoft can't simply remove EQNEDT32.EXE (the old Equation Editor) from Office as there are probably tons of old documents out there containing equations in this old format, which would then become un-editable.
- alkonaut 9 years ago
  
  Ah. missed that. But obviously I'd be very happy for this program to be patched by replacing it with this program:
  MessageBox.Show("This document contains an old equation and you don't have the editor. Do you want to download the old editor?");
  Becuase there comes a point in time when any time you bump into an equation like this, it's actually more likely to be a malicious one.
  Even better if they could at least render the old equation statically using the new office, but not edit it. Then it would be almost insanely rare that anyone needs the old editor.
  
  4 replies →

sswaner 9 years ago

Reminds me of that time Mark Watney used a similar method to patch his rover’s comms to connect to an old radio system.

abainbridge 9 years ago

I wonder how the checked in the fix to the source control system?

Freak_NL 9 years ago
Probably:
git rm -r . git add EQNEDT32.EXE git commit -m "Fixed CVE-2017-11882" git push
(Provided someone automatically updated the code repository to git or some other modern tool in the past 17 years.)
misterdata 9 years ago
'FCIB', apparently: https://blogs.msdn.microsoft.com/oldnewthing/20171114-00/?p=...
- Freak_NL 9 years ago
  
  > F-C-I-B or as a sort-of acronym eff-sib […] stands for "foreign checked-in binary" […] The term FCIB didn't originally mean "foreign checked-in binary". According to legend […] "Not another f—ing checked-in binary!"
emidln 9 years ago

I've done this in the past by checking in both a binary as well as a diff to the previous version. It's sometimes helpful to have both if your SCM doesn't handle binary diffs well.

tzahola 9 years ago

There are plenty of companies hooking into private APIs within Word and Excel with their “productivity tools”. Probably an important MSFT customer was using one of these tools as a crucial part of their operations, so they convinced them not to break it. Just like how Google had to put special cases in Android to keep compatibility with some hacks Facebook was using in their app.

dingo_bat 9 years ago

I salute and respect the guy who did this while hoping I never have to do anything like this.

wruza 9 years ago

I was dreaming I’ll have to do something like that until it faded below the weight of modern <script src> programming. It is like dancing twist, rock and hardbass in the era of electronic arse shaking.

yuhong 9 years ago

I noticed a 0F 1F NOP, which breaks older processors. This is in an update that goes back to Office 2007.

porfirium 9 years ago

So Microsoft lost the source code? Or maybe the engineer couldn't be bothered to set up the old toolchain used to build this executable?

ceautery 9 years ago

Manual patching is more of a bother, I imagine, so this wouldn't be due to a smug engineer getting his prima donna on.
moonbug22 9 years ago
Probably bought it in from a third party. Why'd you think Pinball went away?
- tjalfi 9 years ago
  
  Raymond Chen answered this question five years ago[0].
  [0] https://blogs.msdn.microsoft.com/oldnewthing/20121218-00/?p=...

mariusmg 9 years ago

Why is this so newsworthy ? Crackers do it every day :)

grandpoobah 9 years ago

We prefer the term 'people of non-colour'.