Oooh, I LOVE this! Especially the ability to "Overriding emulated code with C# code" I had a similar idea years ago (https://gabrielgambetta.com/remakes.html), not in the context of a debugger or reverse engineering per se, but in the context of remakes and "special edition" games. Not entirely surprised that this is a byproduct of OpenRakis. Amazing work!
I tried doing something like this about 15 years ago but specifically for audio by routing NES NSF rom audio data (square, triangle, PWM, etc) to virtual midi cables attached to VSTs so you could play any old school Nintendo game with modern instrumentation. Was a pretty fun project.
The closest thing I can think of for graphical rehauls is probably shader pack type stuff - Minecraft is a great example of this.
Reverse engineering old games is like digital archaeology—except instead of digging up fossils, you’re unearthing spaghetti code and DRM nightmares. Spice86 seems like an exciting new shovel for the job!
Forty years ago I had a Sinclair QL with an 8086 emulator. Because the Sinclair QL had preemptive multitasking, I could easily search memory for patterns, monitor locations, stop and start the emulation, or change memory programmatically and easily from the QDOS side. It was worlds easier than using a debugger, particularly since I didn't own an 8086 system.
I always thought it was a clever way to get insights in to software while it was running that wasn't available to people with 8086 systems, and it's interesting to see this idea so many years later.
Bochs and MAME both have superb and widely-used debuggers, while Qemu is more limited but still has some debugging capabilities in its monitor, as well as a gdb integration. (Can’t say anything about PCem/86Box.) It seems that developers of emulators targeting good coverage of old stuff simply can’t not build a debugger, because it’s an integral part of their task to figure out what the hell the devs of the latest failing thing did to make it fail. Bochs is (was?) also quite popular in the OSDev scene as a debugging tool.
DOSBox can be configured to include a debugger. The feature is not enabled in the official binary but the enhanced derivative projects probably have it (DOSBox-X definitely does):
From my brief experience, it seems that reversing old games is one of those disciplines where there is no good step-by-step course. One can start learning some theory (I did so by reading “The Art of Assembly Language Programming” – if I had more time, I'd try “Reverse Engineering for Beginners”) but then one has to get his hands dirty. Real-life games are typically not simple, but one usually just needs to reverse some small parts to produce new interesting modifications.
But surely reading tutorials is useful to learn techniques and tricks. I recommend this article to start: https://www.lodsb.com/reversing-lz91-from-commander-keen (not totally for beginners, but very friendly, and it can help to get used with the jargon). I am also publishing war stories on this topic on my blog (marnetto.net).
In this case one hard part was trying to get code to compile to identical byte for byte output. Which meant working out which compiler options were used, and which specific compiler too. That gives you a hint of what kinda things are involved.
Have you mentioned this before? I saw a similar context comment maybe 3 days ago on here, down to "byte for byte output [...W]hich compiler options were used"
Why can't ghidra (or any other reverse engineering tool) be used directly on the .exe? Why do you have to go through this emulator? Is it because the thing you want to debug only runs in x86 realmode?
Very roughly spoken, the older the platform, the "weirder" stuff that happens at runtime gets in order to make the best of the measly hardware.
I'm looking at a many decades old C64 game right now, and the way I'm doing it is by having taken a memory snapshot when the game was in the state that I'm most interested in at first.
Starting with just the "binary" on disk would be much harder, since there's so much code that just loads, decompresses, initializes overlays etc. which isn't particularly interesting from an actual game logic point of view (though may be very interesting for other reasons), and only exists because the whole game just doesn't fit into all of memory.
There's also a bunch of loading and overlay magic at runtime of the game, but by looking at a snapshot of the game in the state that interests me, I don't have to dive into that (yet).
-there is sometimes not a single statical exe (that means all code inside) but overlays(DOS like DLLs) or serveral other ways of loading code at runtime (example for sound/gfx-drivers) - DOS allows technicaly nearly everything so everything is done in games :)
-many game loaders combine code/data parts of a game in memory - for keeping floppy releases smaller
-self modifying code, also hard to disassemble statically with Gidrah/IDA
-good old segment/offset 16bit realmode games - a complete different beast compare to 32bit linear DOS games (Ghidra isn't very good at this, IDA is much much better)
some examples:
the Stunts loader combines several (in itself non valid) files in memory to create a exe (the single files are packed and the result in exe in memory is also packed) - not that easy to static disassemble something like that
Alpha Waves also got an loader and self modifying code that is not easy to reverse statical
its good to have the best disassemblers available and the best (or better dedicated) debuggers around to keep your reversing project shorter then decades :)
I believe part of the problem is the fact that Aa.exe fil B is created BY packaging multiple library files And or graphics , arrays ETC. and there is no default order into which part of the EXE file they land.
there are some Tools ... hex editors come to mind. I seem to recall NOPING out A jump or two in my younger days
edit: the these days that probably wouldn't work due to CRC checks... but there was a time... Then again that may be just the perfect place to start riverus engineering;) smile I have some good memories of playing a Medal of honor in which I changed all the door Textures to transparent window textures and having to work around CRC protection... good times smiley :)
Approximately how long does it take to collide a CRC naively? I'm guessing there's a trick that makes it faster, these days?
It takes my computer on a single core about 7 minutes to find a nonce for an arbitrary files sha256 to prefix the left side with 4 or 5 zeros (like bitcoin difficulty doubling). Obviously the heat death of the universe would occur trying to collide sha256 on a single core, but CRC - Gemini says it depends on the algorithm, but crc 32 should take about an hour to collide, but it didn't specify "any" or "arbitrary" collisions, but mentioned "any" right before that. So if the most probable sentence after "any collision" is a time estimate, with the logic of LLM implies that's the easier case of any collision.
x86 segmentation makes it very hard to statically analyze anything. In real mode, any byte can be referenced in 4096 different ways. It is even messier in protected mode, since now every selector is an entry in a table, so its value itself is meaningless. So, without runtime analysis, there is no way to tell if 04:1234 is or is not the same byte as fa:1204
> It is even messier in protected mode, since now every selector is an entry in a table, so its value itself is meaningless.
Actually, my experience is that things are much easier in protected mode. Since selector values are chosen by the OS, that means you rely a lot more on internal relocations. And the use of segment selectors is a strong indicator that you have a pointer in the first place.
Unfortunately, ghidra itself struggles to apply these techniques, especially in the decompiler, which seems completely unable to cope with the concept of far pointers.
My guess is portability, then obviously performance.
edit: actually there is a specific answer for this particular project - "We had to rewrite the project in C# to add automated code generation (java doesn't have the goto keyword, making automated ASM translation challenging)". There you are.
I mean, that's more or less the reason why it isn't Java, not why it's ultimately C#. My guess is that Java is just what they're most comfortable with, with C# being similar enough but avoiding specific limitations in that case.
Oooh, I LOVE this! Especially the ability to "Overriding emulated code with C# code" I had a similar idea years ago (https://gabrielgambetta.com/remakes.html), not in the context of a debugger or reverse engineering per se, but in the context of remakes and "special edition" games. Not entirely surprised that this is a byproduct of OpenRakis. Amazing work!
I tried doing something like this about 15 years ago but specifically for audio by routing NES NSF rom audio data (square, triangle, PWM, etc) to virtual midi cables attached to VSTs so you could play any old school Nintendo game with modern instrumentation. Was a pretty fun project.
The closest thing I can think of for graphical rehauls is probably shader pack type stuff - Minecraft is a great example of this.
https://www.sonicether.com/seus
Reverse engineering old games is like digital archaeology—except instead of digging up fossils, you’re unearthing spaghetti code and DRM nightmares. Spice86 seems like an exciting new shovel for the job!
Forty years ago I had a Sinclair QL with an 8086 emulator. Because the Sinclair QL had preemptive multitasking, I could easily search memory for patterns, monitor locations, stop and start the emulation, or change memory programmatically and easily from the QDOS side. It was worlds easier than using a debugger, particularly since I didn't own an 8086 system.
I always thought it was a clever way to get insights in to software while it was running that wasn't available to people with 8086 systems, and it's interesting to see this idea so many years later.
Bochs and MAME both have superb and widely-used debuggers, while Qemu is more limited but still has some debugging capabilities in its monitor, as well as a gdb integration. (Can’t say anything about PCem/86Box.) It seems that developers of emulators targeting good coverage of old stuff simply can’t not build a debugger, because it’s an integral part of their task to figure out what the hell the devs of the latest failing thing did to make it fail. Bochs is (was?) also quite popular in the OSDev scene as a debugging tool.
DOSBox can be configured to include a debugger. The feature is not enabled in the official binary but the enhanced derivative projects probably have it (DOSBox-X definitely does):
- https://www.vogons.org/viewtopic.php?t=3944
- https://github.com/joncampbell123/dosbox-x/wiki/DOSBox%E2%80...
5 replies →
A tutorial on how to reverse engineer a simple DOS game would be absolutely awesome!
From my brief experience, it seems that reversing old games is one of those disciplines where there is no good step-by-step course. One can start learning some theory (I did so by reading “The Art of Assembly Language Programming” – if I had more time, I'd try “Reverse Engineering for Beginners”) but then one has to get his hands dirty. Real-life games are typically not simple, but one usually just needs to reverse some small parts to produce new interesting modifications.
But surely reading tutorials is useful to learn techniques and tricks. I recommend this article to start: https://www.lodsb.com/reversing-lz91-from-commander-keen (not totally for beginners, but very friendly, and it can help to get used with the jargon). I am also publishing war stories on this topic on my blog (marnetto.net).
Tutorials are hard, but there are some great writeups like this which discuss some of the specific problems and trial/error involved
https://neuviemeporte.github.io/category/f15-se2
In this case one hard part was trying to get code to compile to identical byte for byte output. Which meant working out which compiler options were used, and which specific compiler too. That gives you a hint of what kinda things are involved.
Have you mentioned this before? I saw a similar context comment maybe 3 days ago on here, down to "byte for byte output [...W]hich compiler options were used"
1 reply →
https://cosmodoc.org/
Question from a reverse-engineering noob:
Why can't ghidra (or any other reverse engineering tool) be used directly on the .exe? Why do you have to go through this emulator? Is it because the thing you want to debug only runs in x86 realmode?
You can, it's just harder, sometimes.
Very roughly spoken, the older the platform, the "weirder" stuff that happens at runtime gets in order to make the best of the measly hardware.
I'm looking at a many decades old C64 game right now, and the way I'm doing it is by having taken a memory snapshot when the game was in the state that I'm most interested in at first.
Starting with just the "binary" on disk would be much harder, since there's so much code that just loads, decompresses, initializes overlays etc. which isn't particularly interesting from an actual game logic point of view (though may be very interesting for other reasons), and only exists because the whole game just doesn't fit into all of memory.
There's also a bunch of loading and overlay magic at runtime of the game, but by looking at a snapshot of the game in the state that interests me, I don't have to dive into that (yet).
there are so many reasons for that
-there is sometimes not a single statical exe (that means all code inside) but overlays(DOS like DLLs) or serveral other ways of loading code at runtime (example for sound/gfx-drivers) - DOS allows technicaly nearly everything so everything is done in games :)
-many game loaders combine code/data parts of a game in memory - for keeping floppy releases smaller
-self modifying code, also hard to disassemble statically with Gidrah/IDA
-good old segment/offset 16bit realmode games - a complete different beast compare to 32bit linear DOS games (Ghidra isn't very good at this, IDA is much much better)
some examples:
the Stunts loader combines several (in itself non valid) files in memory to create a exe (the single files are packed and the result in exe in memory is also packed) - not that easy to static disassemble something like that
Alpha Waves also got an loader and self modifying code that is not easy to reverse statical
its good to have the best disassemblers available and the best (or better dedicated) debuggers around to keep your reversing project shorter then decades :)
Obfuscation and compression are two potential extra hoops to jump through. It's easier to let the executable run for a bit and start from there.
I believe part of the problem is the fact that Aa.exe fil B is created BY packaging multiple library files And or graphics , arrays ETC. and there is no default order into which part of the EXE file they land. there are some Tools ... hex editors come to mind. I seem to recall NOPING out A jump or two in my younger days edit: the these days that probably wouldn't work due to CRC checks... but there was a time... Then again that may be just the perfect place to start riverus engineering;) smile I have some good memories of playing a Medal of honor in which I changed all the door Textures to transparent window textures and having to work around CRC protection... good times smiley :)
Approximately how long does it take to collide a CRC naively? I'm guessing there's a trick that makes it faster, these days?
It takes my computer on a single core about 7 minutes to find a nonce for an arbitrary files sha256 to prefix the left side with 4 or 5 zeros (like bitcoin difficulty doubling). Obviously the heat death of the universe would occur trying to collide sha256 on a single core, but CRC - Gemini says it depends on the algorithm, but crc 32 should take about an hour to collide, but it didn't specify "any" or "arbitrary" collisions, but mentioned "any" right before that. So if the most probable sentence after "any collision" is a time estimate, with the logic of LLM implies that's the easier case of any collision.
3 replies →
x86 segmentation makes it very hard to statically analyze anything. In real mode, any byte can be referenced in 4096 different ways. It is even messier in protected mode, since now every selector is an entry in a table, so its value itself is meaningless. So, without runtime analysis, there is no way to tell if 04:1234 is or is not the same byte as fa:1204
> It is even messier in protected mode, since now every selector is an entry in a table, so its value itself is meaningless.
Actually, my experience is that things are much easier in protected mode. Since selector values are chosen by the OS, that means you rely a lot more on internal relocations. And the use of segment selectors is a strong indicator that you have a pointer in the first place.
Unfortunately, ghidra itself struggles to apply these techniques, especially in the decompiler, which seems completely unable to cope with the concept of far pointers.
2 replies →
Why are so many emulators written in C#?
I don’t think the language is necessarily chosen for the project. I think C# is just a main stream language that a lot of people know.
My guess is portability, then obviously performance.
edit: actually there is a specific answer for this particular project - "We had to rewrite the project in C# to add automated code generation (java doesn't have the goto keyword, making automated ASM translation challenging)". There you are.
I mean, that's more or less the reason why it isn't Java, not why it's ultimately C#. My guess is that Java is just what they're most comfortable with, with C# being similar enough but avoiding specific limitations in that case.
2 replies →