Comment by rspeer

7 years ago

The pattern of bytes above 0x7F isn't anything to do with x86 architecture. It's UTF-8, the most common way to encode Unicode in bytes.

You'll have more success understanding what it's doing if you decode it from UTF-8 first.

Yes, that's what I meant by "some unicode type of thing.... reminiscing to an old bug emerged two years ago" in my last post...

But on the hand, note the strong pattern of "0xCC" and "0xCD" (could be a coincidence to the x86 Breakpoint/INT code), pacing at a fixed distance throughout the code, as well as the numeric characteristics those parameters (or simply garbage code) to the "0xCC" and "0xCD" exhibit. I just feel that if there are certain numeric relations among those code chunks (for instance, all falling within a certain relatively narrow range, as all the parameters to "0xCD" are less than 0xA0 but above 0x80, a width of 32 units), it probably tells something (assuming it has been crafted in that way, with a meaningful purpose) -- but of course, my chunking method might be wrong and each character point (or instruction) might be longer 2 bytes...

And also note that a 12-MB buffer data devised within 10 minutes always seems to be a bit brute force -- so buffer overrun is quite likely in such case; and then out-of-bound data triggers some unhandlable action (as the app crashes) throughout all the exception handling stack -- in the simplistic scenario, would be a segfault -- but of course, that cheesecakeufo could do a bit more exploration with this buffer overrun -- but I guess that normally takes more than 10 minutes, for normal people....

I stopped working on this bug -- it would be a luxury to play a whole afternoon with this puzzle...do you have any further findings? Well, if I had more time, I probably would change the code a little and open it on an extra device, and see if there are any different effects...

  • I saw a tweet about what it looks like (in some piece of software that at least manages to render something): https://twitter.com/BagusAlexandria/status/95347388267712921...

    The fact that the embellished t's form these big overlapping blocks makes me think that it's hitting the worst-case behavior of some text layout algorithm.

    I don't understand what all the hex digits and apostrophes are for, though.

    • I don't have a spare ios system at hand right now to do more testing on this bug (if the bug is inside the library code and not at the API level, then an emulator usually would not be able to replicate the problem in the same way a real system produces)...

      hence I can only make guesses...if I had a spare one, I would try to modify the content of code and see what would happen (do things such as reducing the code size -- most of the code are repetitive -- and see what would happen if we only retain the core part; or only retain the first half of the entire code -- those 0xCC and 0xCD part; or only retain the second half, those displayable ascii chars)...trim the code down to smaller units and test on them individually (in the same sense as modular testing).

    • each displayed lines of characters in that twitter image is exactly the same...they only shift their position a bit.

      it looks like the black shades over the character lines are the vestiges left by the non-ascii characters at the beginning of the code.

      the displayed part is not just ascii hex and apostrophes characters, there are also punctuation characters there...My guess is that if they are displayed, then that means the system has already successfully handled them and their triggered actions have been contained in defined system behaviors...