Comment by throwawaylinux

4 years ago

Right, so you don't really know what it's doing at all. That it does something different is expected.

> Indeed, thus it's highly likely this is a dumb firmware bug, like the FLUSH implementation being really naive and nobody having cared until now because it wasn't a problem on devices where nothing flushes anyway. I don't think that's highly likely at all. I think it's highly unlikely.

> Yup, it's not rocket science, it's humans writing code. And humans write bad code. Apple engineers write bad code too, just take a look at some parts of XNU ;-)

I'm not some Apple apologist. I think their fsync() thing is stupid (although very surprised you didn't know about it and took you so long to check the man page, it's a old and well known issue and I don't even use or program for OSX). The hardware is clearly not very good for the task of a non-battery PC (even on batteries I think it's a questionable choice unless they can flush data in case of OS crash or low battery shutdown. I also think their kernel is low performing and a poor Frankenstein mishmash of useless microkernel bits. So you're not getting me on that one.

> Full flushes are rare on devices where the cache can be considered persistent anyway because there's a battery and the kernel is set up to flush on panics/emergency situations (which it is). Thus nobody ever ran into the performance problem, thus it never got fixed.

I never said the hardware was suitable for this type of operation.

> The dumbest cache implementation is a big fixed size hash table. That's easy to background flush incrementally on capacity, but then if you want to do a full flush you end up having to do a linear scan even if the cache is mostly empty.

I can think of dumber. A linked list you have to search.

This approach is really bad even if you don't have any syncs because you still want to place LBAs linearly even on NAND otherwise your read performance on large blocks

The fact you can come up with stupid thing that might explain it isn't a very good argument IMO. Sure that might be the case I didn't say it was impossible just didn't think it was likely. You're saying it's certainly the case. Don't think there's enough evidence, at best.

5 comments

throwawaylinux

marcan_42 4 years ago

Look, it's just logic. There's a couple pages in cache. It has to flush them. Finding them and doing that doesn't take 10MB/s of memory traffic and 20ms unless you're doing something stupid. If it were a hardware problem with the underlying storage it wouldn't be eating DRAM bandwidth. The fact that it's doing that means it's doing something with the data in the DRAM carveout (cache) which is much larger/more complicated than what a good data structure would require to find the data to flush. The bandwidth should be .3MB/s plus a negligible bit of overhead for the data structure parts, which is the bandwidth of the data being written (and what you get if you do normal writes without flushing at the same rate). Anything above 1MB/s is suspicious, nevermind 10MB/s.

throwawaylinux 4 years ago
The logic is flawed though. You don't have the evidence or logic that it's certainly a bug or due to stupidity or oversight. I also don't know for certain that it's not which I'll acknowledge.
And if it was a strange forward map structure that takes a lot of time to flush but is fast or small or easy to implement, that actually supports my statement. That it was a deliberate design choice. Not a firmware bug. Gather delay was one example I gave, not an exhaustive list.
- marcan_42 4 years ago
  
  > And if it was a strange forward map structure that takes a lot of time to flush but is fast or small or easy to implement, that actually supports my statement. That it was a deliberate design choice.
  By that logic, every time a programmer uses an inefficient data structure and introduces pathological performance it's not a bug, it's a "deliberate design choice".
  At this point we're arguing semantics. My point is it's slow when it shouldn't be, and it can be made faster. Whether it's a "bug" or not comes down to whether Apple fixes it or not. I consider it a bug in my book because you don't normally design things to be 10-100x slower than the competition. It's too blatant not to be an oversight.
  And I've seen Apple make many oversights in the past year and fix them in an update. There is plenty of evidence the platform was rushed and lots of things were full of jank in early macOS 11 that got fixed on the way to 12, and more are still being fixed. This would be one of many and completely in line with history so far. It's why we're requiring 12.1+ firmware as a baseline for Asahi going forward, because many things were fixed and I don't want to deal with the buggy versions.
  
  1 reply →