Comment by Fiveplus
11 days ago
>The goal is, that xfwl4 will offer the same functionality and behavior as xfwm4 does...
I wonder how strictly they interpret behavior here given the architectural divergence?
As an example, focus-stealing prevention. In xfwm4 (and x11 generally), this requires complex heuristics and timestamp checks because x11 clients are powerful and can aggressively grab focus. In wayland, the compositor is the sole arbiter of focus, hence clients can't steal it, they can only request it via xdg-activation. Porting the legacy x11 logic involves the challenge of actually designing a new policy that feels like the old heuristic but operates on wayland's strict authority model.
This leads to my main curiosity regarding the raw responsiveness of xfce. On potato hardware, xfwm4 often feels snappy because it can run as a distinct stacking window manager with the compositor disabled. Wayland, by definition forces compositing. While I am not concerned about rust vs C latency (since smithay compiles to machine code without a GC), I am curious about the mandatory compositing overhead. Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?
(xfwl4 author here.)
> I wonder how strictly they interpret behavior here given the architectural divergence?
It's right there in the rest of the sentence (that you didn't quote all of): "... or as much as possible considering the differences between X11 and Wayland."
I'll do my best. It won't be exactly the same, of course, but it will be as close as I can get it.
> As an example, focus-stealing prevention.
Focus stealing prevention is a place where I think xfwl4 could be at an advantage over xfwm4. Xfwm4 does a great job at focus-stealing prevention, but it has to work on a bunch of heuristics, and sometimes it just does the wrong thing, and there's not much we can do about it. Wayland's model plus xdg-activation should at least make the focus-or-don't-focus decision much more consistent.
> I am curious about the mandatory compositing overhead. Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?
I'm not sure yet, but I suspect your fears are well-founded here. On modern (and even not-so-modern) hardware, even low-end GPUs should be fine with all this (on my four-year-old laptop with Intel graphics, I can't tell the difference performance-wise with xfwm4's compositor on or off). But I know people run Xfce/X11 on very-not-modern hardware, and those people may unfortunately be left behind. But we'll see.
If xfwl4 plans to implement something like sway output max_render_time, then input to pixel output latency should be same or even lower than x11
At least they are honest regarding the reasons, not a wall of text to justify what bails down to "because I like it".
Naturally these kinds of having a language island create some attrition regarding build tooling, integration with existing ecosystem and who is able to contribute to what.
So lets see how it evolves, even with my C bashing, I was a much happier XFCE user than with GNOME and GJS all over the place.
You know that all the Wayland primitives, event handling and drawing in gnome-shell are handled in C/native code through Mutter, right ? The JavaScript in gnome-shell is the cherry on top for scripting, similar to C#/Lua (or any GCed language) in game engines, elisp in Emacs, event JS in QtQuick/QML.
It is not the performance bottleneck people seem to believe.
I can dig out the old GNOME tickets and related blog posts...
Implementation matters, including proper use of JIT/AOT toolchains.
1 reply →
It has been the case that stalls in the GJS land can stall the compositor though, especially if it's during a GC cycle.
1 reply →
> ...or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?
I think I know what "frame perfect" means, and I'm pretty sure that you've been able to get that for ages on X11... at least with AMD/ATi hardware. Enable (or have your distro enable) the TearFree option, and there you go.
I read somewhere that TearFree is triple buffering, so -if true- it's my (perhaps mistaken) understanding that this adds a frame of latency.
> I read somewhere that TearFree is triple buffering, so -if true- it's my (perhaps mistaken) understanding that this adds a frame of latency.
True triple buffering doesn't add one frame of latency, but since it enforces only whole frames be sent to the display instead of tearing, it can cause partial frames of latency. (It's hard to come up with a well-defined measure of frame latency when tearing is allowed.)
But there have been many systems that abused the term "triple buffering" to refer to a three-frame queue, which always does add unnecessary latency, making it almost always the wrong choice for interactive systems.
only on the primary display. once you had more than one display there were only workarounds.
I don't know what "workarounds" you're talking about, or what unwanted behavior that I presume you're talking about. Would you be more specific?
I ask because just a few minutes ago, I ran VRRTest [0] on my dual-monitor machine and saw no screen tearing on either monitor. Because VRR is disabled in multi-monitor setups, I saw juddering on both monitors when I commanded VRRTest render rates that weren't a multiple of the monitor's refresh rate, but no tearing at all.
My setup:
* Both monitors hooked up via DisplayPort
* Radeon 9070 (non-XT)
* Gentoo Linux, running almost all ~amd64 packages.
* x11-base/xorg-server-21.1.20
* x11-drivers/xf86-video-amdgpu-25.0.0-r1
* x11-drivers/xf86-video-ati-22.0.0
* sys-kernel/gentoo-sources-6.18.5
* KDE and Plasma packages are either version 6.22.0 or 6.5.5. I CBA to get a complete list, as there are so many relevant packages.
[0] <https://github.com/Nixola/VRRTest>
1 reply →
One thing to keep in mind is that composition does not mean you have to do it with vsync, you can just refresh the screen the moment a client tells you the window has new contents.
Compositor overhead even with cheapo Intel laptop graphics is basically a non-issue these days. The people still rocking their 20 year old thinkpads might want to choose something else, but besides that kind of user I don't think it's worth worrying too much about.
It isn't always pure overhead, but also jitter, additional delays and other issues caused by the indirection. Most systems have a way to mostly override the compositor for fullscreen windows and for games and other applications where visible jitter and delays are an issue you want that even on modern hardware.
> Most systems have a way to mostly override the compositor for fullscreen windows and for games
No, they don't. I don't think Wayland ever supported exclusive fullscreen, MacOS doesn't, and Windows killed it a while back as well (in a Windows 10 update like 5-ish years ago?)
Jitter is a non-issue for things you want vsync'd (like every UI), and for games the modern solution is gsync/freesync which is significantly better than tearing.
5 replies →
That matches what I recall too, back when I ran a very cheap integrated intel (at least that's what I recall) card on my underpowered laptop. I posted a few days ago with screenshots of my 2009 setup with awesome+xcompmgr, and I remember it being very snappy (much more so than my tuned Windows XP install at the time). https://news.ycombinator.com/item?id=46717701
I ran xfwm's compositor back when it was first introduced on a 400 MHz Pentium II with a GeForce 2. It was fully fine.
The compositing tax is just waiting for vsync; unless your machine is, like, a Pentium Classic, compositing itself isn't a problem.
> Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?
I think this is ultimately correct. The compositor will have to render a frame at some point after the VBlank signal, and it will need to render with it the buffers on-screen as of that point, which will be from whatever was last rendered to them.
This can be somewhat alleviated, though. Both KDE and GNOME have been getting progressively more aggressive about "unredirecting" surfaces into hardware accelerated DRM planes in more circumstances. In this situation, the unredirected planes will not suffer compositing latency, as their buffers will be scanned out by the GPU at scanout time with the rest of the composited result. In modern Wayland, this is accomplished via both underlays and overlays.
There is also a slight penalty to the latency of mouse cursor movement that is imparted by using atomic DRM commits. Since using atomic DRM is very common in modern Wayland, it is normal for the cursor to have at least a fraction of a frame of added latency (depending on many factors.)
I'm of two minds about this. One, obviously it's sad. The old hardware worked perfectly and never had latency issues like this. Could it be possible to implement Wayland without full compositing? Maybe, actually. But I don't expect anyone to try, because let's face it, people have simply accepted that we now live with slightly more latency on the desktop. But then again, "old" hardware is now hardware that can more often than not, handle high refresh rates pretty well on desktop. An on-average increase of half a frame of latency is pretty bad with 60 Hz: it's, what, 8.3ms? But half a frame at 144 Hz is much less at somewhere around 3.5ms of added latency, which I think is more acceptable. Combined with aggressive underlay/overlay usage and dynamic triple buffering, I think this makes the compositing experience an acceptable tradeoff.
What about computers that really can't handle something like 144 Hz or higher output? Well, tough call. I mean, I have some fairly old computers that can definitely handle at least 100 Hz very well on desktop. I'm talking Pentium 4 machines with old GeForce cards. Linux is certainly happy to go older (though the baseline has been inching up there; I think you need at least Pentium now?) but I do think there is a point where you cross a line where asking for things to work well is just too much. At that point, it's not a matter of asking developers to not waste resources for no reason, but asking them to optimize not just for reasonably recent machines but also to optimize for machines from 30 years ago. At a certain point it does feel like we have to let it go, not because the computers are necessarily completely obsolete, but because the range of machines to support is too wide.
Obviously, though, simply going for higher refresh rates can't fix everything. Plenty of laptops have screens that can't go above 60 Hz, and they are forever stuck with a few extra milliseconds of latency when using a compositor. It is unideal, but what are you going to do? Compositors offer many advantages, it seems straightforward to design for a future where they are always on.
Love your post. So, don’t take this as disagreement.
I’m always a little bewildered by frame rate discussions. Yes, I understand that more is better, but for non-gaming apps (e.g. “productivity” apps), do we really need much more than 60 Hz? Yes, you can get smoother fast scrolling with higher frame rate at 120 Hz or more, but how many people were complaining about that over the last decade?
I enjoy working on my computer more at 144Hz than 60Hz. Even on my phone, the switch from 60Hz to a higher frame rate is quite obvious. It makes the entire system feel more responsive and less glitchy. VRR also helps a lot in cases where the system is under load.
60Hz is actually a downgrade from what people were used to. Sure, games and such struggled to get that kind of performance, but CRT screens did 75Hz/85Hz/100Hz quite well (perhaps at lower resolutions, because full-res 1200p sometimes made text difficult to read on a 21 inch CRT, with little benefit from the added smoothness as CRTs have a natural fuzzy edge around their straight lines anyway).
There's nothing about programming or word processing that requires more than maybe 5 or 6 fps (very few people type more than 300 characters per minute anyway) but I feel much better working on a 60 fps screen than I do a 30 fps one.
Everyone has different preferences, though. You can extend your laptop's battery life by quite a bit by reducing the refresh rate to 30Hz. If you're someone who doesn't really mind the frame rate of their computer, it may be worth trying!
3 replies →
I never complained about 60, then I went to 144 and 60 feels painful now. The latency is noticable in every interaction, not just gaming. It's immediately evident - the computer just feels more responsive, like you're in complete control.
Even phones have moved in this direction, and it's immediately noticable when using it for the first time.
I'm now on 240hz and the effect is very diminished, especially outside of gaming. But even then I notice it, although stepping down to 144 isn't the worst. 60, though, feels like ice on your teeth.
3 replies →
> how many people were complaining about that over the last decade?
Quite a few. These articles tend to make the rounds when it comes up: https://danluu.com/input-lag/ https://lwn.net/Articles/751763/ Perception varies from person to person, but going from my 144hz monitor to my old 60hz work laptop is so noticeable to me that I switched it from a composited wayland DE to an X11 WM.
5 replies →
If our mouse cursors are going to have half a frame of latency, I guess we will need 60Hz or 120Hz desktops, or whatever.
I dunno. It does seem a bit odd, because who was thinking about the framerates of, like, desktops running productivity software, for the last couple decades? I guess I assumed this would never be a problem.
2 replies →
I agree. Keyboard-action-to-result-on-screen latency is much more important, and we are typically way above 17 ms for that.
2 replies →
Essentially, the only reason to go over 60 Hz for desktop is for a better "feel" and for lower latency. Compositing latency is mainly centered around frames, so the most obvious and simplest way to lower that latency is to shorten how long a frame is, hence higher frame rates.
However, I do think that high refresh rates feel very nice to use even if they are not strictly necessary. I consider it a nice luxury.
1 reply →
I couldn't find ready stats on what percentage of displays are 60 hz but outside of gaming and high end machines I suspect 60 hz is still the majority of of machines used by actual users meaning we should evaluate the latency as it is observed by most users.
The point is that we can improve latency of even old machines by simply attaching a display output that supports a higher refresh rate, or perhaps even variable refresh rate. This can negate most of the unavoidable latency of a compositor, while other techniques can be used to avoid compositor latency in more specific scenarios and try to improve performance and frame pacing.
A new display is usually going to be cheaper than a new computer. Displays which can actually deliver 240 Hz refresh rates can be had for under $200 on the lower end, whereas you can find 180 Hz displays for under $100, brand new. It's cheap enough that I don't think it's even terribly common to buy/sell the lower end ones second-hand.
For laptops, well, there is no great solution there; older laptops with 60 Hz panels are stuck with worse latency when using a compositor.
2 replies →
> As an example, focus-stealing prevention. In xfwm4 (and x11 generally), this requires complex heuristics and timestamp checks because x11 clients are powerful and can aggressively grab focus. In wayland, the compositor is the sole arbiter of focus, hence clients can't steal it, they can only request it via xdg-activation. Porting the legacy x11 logic involves the challenge of actually designing a new policy that feels like the old heuristic but operates on wayland's strict authority model.
Not that that's necessarily the best way to do it but nothing stops xfwl4 from simply granting every focus request and then applying their existing heuristics on the result of that.
> Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?
well, the answer is just no, wayland has been consistently slower than X11 and nothing running on top can't really go around that
Can you cite any sources for that claim? I found this blog post that says wayland is pretty much on par with X11 except for XWayland, which should be considered a band-aid only anyways: https://davidjusto.com/articles/m2p-latency/
Here's one article: https://mort.coffee/home/wayland-input-latency/
It's specifically about cursor lag, but I think that's because it's more difficult to experimentally measure app rendering latency.
1 reply →
> wayland has been consistently slower than X11
Wayland is a specification, it has an inability to be "faster" than other options. That's like saying JSON is 5% slower than Word.
And as for the implementations being slower than X, that also doesn't reflect reality.
https://www.phoronix.com/review/ubuntu-2504-x11-gaming
There is no Wayland to run on top of as its a standard to implement rather than a server to talk to.
Xfce / xfwm4 doesn't offer focus stealing prevention.
Settings -> Window Manager Tweaks -> Focus -> Activate focus stealing prevention
https://gitlab.xfce.org/xfce/xfwm4/-/blob/master/settings-di...
The option is there, it just never works: opensnitch-ui will popup and steal focus. Any gog installer (ran via wine) will steal focus when install finishes, and so on and on and on.