Comment by vjerancrnjak

7 months ago

I think it's the Wikipedia article.

https://en.wikipedia.org/wiki/Io_uring

Very easy to just quote that without any io_uring experience.

> In June 2023, Google's security team reported that 60% of the exploits submitted to their bug bounty program in 2022 were exploits of the Linux kernel's io_uring vulnerabilities. As a result, io_uring was disabled for apps in Android, and disabled entirely in ChromeOS as well as Google servers. Docker also consequently disabled io_uring from their default seccomp profile.

18 comments

vjerancrnjak

weitendorf 7 months ago

I don't work at Google anymore and don't have any special insight into the internal adoption of io_uring, but I think it stands to reason that Google would benefit tremendously from rolling out a higher-performing way to do IO across their fleet. I mean, having myself done some lowish-level performance/optimization work and knowing that the impact of these kinds of changes is measurable and the scale is almost fleetwide, I wouldn't be surprised if the benefits - after major internal libraries/tools are also updated to use io_uring - are O(Really Big Money)

Having talked to members of their prodkernel team about other subjects, I also think they are competent enough to know the difference between "not ready" and "acceptably flawed". And believe me, the incentives are such that O(Really Big Money) optimization projects get staffed unless there is something making them infeasible.

Not everybody has the same threat model and security stance as Google and that's ok. But personally I would take their internal adoption of io_uring very seriously as a measure of whether it's safe for me to adopt it, especially if I'm running untrusted or third party software (including certain kinds of libraries).

ciconia 7 months ago
> the incentives are such that O(Really Big Money) optimization projects get staffed unless there is something making them infeasible.
Switching to io_uring is not just moving from one API to another. It necessitates a serious rethinking of your concurrency model. I guess for big, established codebases this is a very substantial undertaking, security consideration notwithstanding.
- weitendorf 7 months ago
  
  On the library/internal workload side the impact would certainly not be something that fully lands overnight, but Google has a very centralized tech stack and special tooling for fleetwide code migrations. I have no insight to the particulars but I would guess there is a Pareto-like distribution of easy upgrades+big wins and a long-tail of marginal/thorny upgrades.
  Google is big enough and invests enough in infrastructure projects that they staff projects like making their own internal concurrency primitives (side note, factors like this can improve/reduce or simplify/complexify migrations substantially): https://www.phoronix.com/news/Google-Fibers-Toward-Open
- junon 7 months ago
  
  Eh let's not be dramatic, if you're already using async runtimes of some sort it's not that much of an upset to switch.
  
  1 reply →
delusional 7 months ago
Disabling it on Android and ChromeOS does not mean they don't use it internally. Android and ChromeOS is end user devices, optimizing those platforms don't earn google any money.
- weitendorf 7 months ago
  
  Can you find anywhere that states that they are using it internally? They have publicly stated at various points that they do not, such as at https://security.googleblog.com/2023/06/learnings-from-kctf-... and I have not seen anything yet stating that they are now using it. Also, you might want to reread my comment because I wasn't talking about Android/ChromeOS, it was exclusively about their "fleet" by which I meant "servers"
  By the way, here is a good + recent example of the types of CVEs that IO_uring runs into that google finds and discloses/fixes: https://project-zero.issues.chromium.org/issues/417522668. Here's another: https://project-zero.issues.chromium.org/issues/388499293
  Given that io_uring mostly seems to be the project of one guy at Meta, and has a regular stream of new and exciting use after free/out of bounds vulnerabilities, I think it makes sense for security-inclined users to disable it or at least only use it once soaked/stabilized
- rahkiin 7 months ago
  
  GP: > > as well as Google servers
  
  1 reply →
dathinab 7 months ago
them disabling it is only about Android/Chrome
not about their servers
I wouldn't be surprised if they do have servers with it enabled when very useful.
and Android Linux kennels lack behind in their version
- weitendorf 7 months ago
  
  No, it was about servers, and I worked there on similar stuff/with the same people involved in the serverside ("fleetwide") rollout. Public post describing the decision to disable it internally: https://security.googleblog.com/2023/06/learnings-from-kctf-...
  I'd love to see a post explaining a decision to consider it stable or that mentions that they've rolled it out on their fleet

flomo 7 months ago

Without going into the weeds, there has be some vendor support, and that vendor is obviously not google. How to convince people: Get it into RHEL.

stefanha 7 months ago

io_uring is available from RHEL 9.3 onward. The catch is that it's disabled by default and needs to be enabled at runtime via the "kernel.io_uring_disabled" sysctl.
rendaw 7 months ago

If that's the case, it's not indicated by the quote. The quote lays all the blame on io_uring. Is that incorrect?

znpy 7 months ago

Jens Axboe replies on the very first line of the thread:

> As I'm sure you know, this is all mostly centered around a) google using an old kernel on android

fulafel 7 months ago

But also
> My hope is that this reputation will go away eventually, as less issues are found in the code.
this has not yet happened like this other comment shows: https://news.ycombinator.com/item?id=44632639

dathinab 7 months ago

yes but what this isn't telling you is that android has a long history of running hopelessly outdated kennels and it being very common that Linux kernel related android cves related to newish features have already been fixed upstream by generic improvements to that feature code

yjftsjthsd-h 7 months ago

I like how someone helpfully added

> Although initial async offload design in io_uring could be problematic, later kernels changed the thread model. After such improvements, there were no known inherent problems with it and its development is very careful with new features. Considering that a performant async framework with a user facing API is complex, it was to be expected that issues would be found initially. After initial issues have been addressed, it is not any less secure than anything else in the kernel and io_uring acceptance quickly grew in production. Some of its criticism are also based on wrong or outdated assumptions.[14]

...but the only citation is a link to this GH thread, which doesn't support the claims made.