Comment by nonrandomstring

9 months ago

Elaborate memory management (paging) systems need caching of lookups for high performance. But they can go wrong. The post was made in a security/safety context but did I miss something, because it didn't seems to make clear what the dangers are?

14 comments

nonrandomstring

caspper69 9 months ago

I only know x86/64, but I assume most page table caching would be somewhat similar.

Basically, if you don't handle the TLB properly, the CPU will not know that page mappings and/or page permissions have changed. So if you had a page mapped RW, and then changed the mapping to a RO page (such as setting up COW), but failed to flush the TLB (or at least call INVLPG to flush the entry), the CPU might use those stale permissions and grant write access on that page when it shouldn't. The same could happen for changing a region of the VA space to use a different physical page, where the next bit of code would hit the old page (and who knows what state it might be in or what it could be being used for).

The TLB is not super-complicated, but it has some quirks (it's been so long since I've done anything with it, the PCID handling rules were new to me; didn't even support it back when).

immibis 9 months ago

I conclude that the title is wrong. Every developer doesn't need to know these things - only kernel developers need to know about TLB invalidation.

egberts1 9 months ago

And us machine emulators too, like Fabrice Bellard (QEMU) and me (and my OP post detailed the failing of emulated TLB in QEMU as discovered by Unicorn emulator).
Unicorn emulator - https://github.com/unicorn-engine/unicorn
bell-cot 9 months ago

Every developer needs to know that cache invalidation is one of the two hard things in computer science - and that people further down in your stack occasionally get it wrong.

adrian_b 9 months ago

The article (towards its end) discusses a serious bug in the INVLPG instruction of the Intel Gracemont processor cores (which are the E-cores used in Alder Lake, Raptor Lake, Raptor Lake Refresh, Alder Lake N, Amston Lake, Twin Lake), which fails to invalidate all the entries that it should invalidate, in certain circumstances.

rybosworld 9 months ago

I'm no expert on TLB invalidation bugs but generally they allow for an attacker to read/write arbitrary memory.

https://googleprojectzero.blogspot.com/2019/01/taking-page-f...

caspper69 9 months ago
I don't mean to be a pedant, so someone please correct me if I'm wrong, but I don't think TLB mishandling would result in arbitrary memory access (I suppose in the strictest sense arbitrary can just mean random, but generally I have understood it to imply that the address can be attacker controlled, which a stale TLB wouldn't allow).
Unless you're like Microsoft (from your link) and accidentally leave the page tables writable from userspace for 2 months. But that's not really a TLB error, that's just L-O-L, wow!
- jcalvinowens 9 months ago
  
  Random access is arbitrary access, given enough time. You can try over and over again until you get lucky.
  Imagine I'm a user with local shell access trying to read a secret owned by root. Maybe I can't read the secret, but I can do something which makes another program read the secret. If I can make that program swap (perhaps by wasting a bunch of RAM to create memory pressure), and swapping has some probability of triggering a TLB invalidation bug that lets me see the old page, I win, although it might take awhile.
egberts1 9 months ago
Read-Write-eXecute TLB memory region can be found in JavaScript, Java, Dalvik (Android), and Python.
- Retr0id 9 months ago
  
  Modern javascript engines (namely V8) avoid RWX, although last time I checked there's been a backslide as part of WASM implementation.
  CPython also no longer appears to create RWX mappings even for ctypes, although you can of course still mmap them manually.
  
  2 replies →

fn-mote 9 months ago

It looks like the last 20 or so pages of the PDF contain two case studies. I read the first one, which lead to (nondeterministic) kernel errors.

Perhaps “hacker” should be “crazy bug debugger”, but anybody who is working with TLB issues is a hacker in my book.

There is no “CVE” vulnerability in the slides, for sure.