Modern game engines buffer 3 or 4 frames, sometimes 5. Not unusual to have 140ms latency on 60hz screen between clicking mouse1 and seeing the muzzle flash.
* deferred vs forward rendering (deferred adds latency)
* multithreaded vs singlethreaded
* vsync (double buffering)
It's far cry from simplified pure math people think of when they think of fps in games or refresh rate for office and typing. Software is very very lazy lately, and most of the time these issues are being fixed by throwing more hardware at it, not fixing the code.
> Software is very very lazy lately, and most of the time these issues are being fixed by throwing more hardware at it, not fixing the code.
Some things cannot be 'fixed'. It's always a trade-off. You can't expect to have all the fancy effects that rely on multiple frames and also low latency.
If there was a simple software fix, GPU manufacturers would be all over it and pushing it to all engines. It's in their interests to have the lowest latency possible to attract the more hard-core gamers (which then influence others).
Just look at all the industry cooperation that had to happen to implement adaptive sync. That goes all the way from game developers, engines, GPUs, monitors. Sure that sells more hardware(which brings other benefits), but a software-only approach would also allow companies to sell hardware, by virtue of their "optimized" drivers.
Games that are aiming more for a "cinematic narrative experience" might be perfectly fine with a few 33ms frames of latency, and a total input latency far exceeding 100ms. Competitive twitchy games will tend to be more aggressive. And VR games too, of course.
In principle, you can push GPU pipelines to very low latencies. Continually uploading input and other state asynchronously and rendering from the most recent snapshot (with some interpolation or extrapolation as needed for smoothing out temporal jitter) can get you down to total application-induced latencies below 10ms. Even less with architectures that decouple shading and projection.
Doing this requires leaving the traditional 'CPU figures out what needs to be drawn and submits a bunch of draw calls' model, though. The GPU needs to have everything it needs to determine what to draw on its own. If using the usual graphics pipeline, that would mean all frustum/occlusion culling and draw command generation happens on the GPU, and the CPU simply submits indirect calls that tell the GPU "go draw whatever is in this other buffer that you put together".
This is something I'm working on at the moment, and the one downside is that other games that don't try to clamp down on latency now cause a subtle but continuous mild frustration.
Yeah, I'm far from an expert on rendering and latency, but presumably game developers put a ton of effort into ensuring that the pixels are pushed with as little input latency as possible. This may not have been a priority for Microsoft in their terminal.
The first comment in this function (DxEngine::StartPaint), for example:
// If retro terminal effects are on, we must invalidate everything for them to draw correctly.
// Yes, this will further impact the performance of retro terminal effects.
// But we're talking about running the entire display pipeline through a shader for
// cosmetic effect, so performance isn't likely the top concern with this feature.
For games, consistent, smooth frame rates and vsync(no tearing) is more important than input lag so often times things will be buffered.
That said, the VR space has a much tighter tolerance on input lag and there's hardware based mitigations. Oculus has a lot of techniques such as "Asynchronous Spacewarp" which will calculate intermediate frames based on head movement(an input) and movement vectors storing the velocity of each pixel. They also have APIs to mark layers as head locked or free motion etc.
Modern game engines buffer 3 or 4 frames, sometimes 5. Not unusual to have 140ms latency on 60hz screen between clicking mouse1 and seeing the muzzle flash.
https://www.youtube.com/watch?v=8uYMPszn4Z8 -- check at 6:30 the latency of 60fps vsync on 60hz. It's not even close to 16ms (1/60), it's ~118ms (7.1/60).
It's far cry from simplified pure math people think of when they think of fps in games or refresh rate for office and typing. Software is very very lazy lately, and most of the time these issues are being fixed by throwing more hardware at it, not fixing the code.
> Software is very very lazy lately, and most of the time these issues are being fixed by throwing more hardware at it, not fixing the code.
Some things cannot be 'fixed'. It's always a trade-off. You can't expect to have all the fancy effects that rely on multiple frames and also low latency.
If there was a simple software fix, GPU manufacturers would be all over it and pushing it to all engines. It's in their interests to have the lowest latency possible to attract the more hard-core gamers (which then influence others).
Just look at all the industry cooperation that had to happen to implement adaptive sync. That goes all the way from game developers, engines, GPUs, monitors. Sure that sells more hardware(which brings other benefits), but a software-only approach would also allow companies to sell hardware, by virtue of their "optimized" drivers.
> * deferred vs forward rendering (deferred adds latency)
Wah? Deferred just refers to a screen space shading technique but it still happens once every frame.
> * multithreaded vs singlethreaded
Not sure what you're saying here.
And then of course, yes display buffering does have an impact.
Nope. Games usually opt for deeper pipelining to help keep framerates higher if they are making any choice at all. They usually just run at whatever rate they run at, and don't really do "latency tuning." Which is where products like AMD's Anti-Lag ( https://www.amd.com/en/technologies/radeon-software-anti-lag ) and Nvidia's Reflex ( https://www.nvidia.com/en-us/geforce/news/reflex-low-latency... ) enter the picture to just give games a library to help with latency instead.
Games that are aiming more for a "cinematic narrative experience" might be perfectly fine with a few 33ms frames of latency, and a total input latency far exceeding 100ms. Competitive twitchy games will tend to be more aggressive. And VR games too, of course.
In principle, you can push GPU pipelines to very low latencies. Continually uploading input and other state asynchronously and rendering from the most recent snapshot (with some interpolation or extrapolation as needed for smoothing out temporal jitter) can get you down to total application-induced latencies below 10ms. Even less with architectures that decouple shading and projection.
Doing this requires leaving the traditional 'CPU figures out what needs to be drawn and submits a bunch of draw calls' model, though. The GPU needs to have everything it needs to determine what to draw on its own. If using the usual graphics pipeline, that would mean all frustum/occlusion culling and draw command generation happens on the GPU, and the CPU simply submits indirect calls that tell the GPU "go draw whatever is in this other buffer that you put together".
This is something I'm working on at the moment, and the one downside is that other games that don't try to clamp down on latency now cause a subtle but continuous mild frustration.
Yeah, I'm far from an expert on rendering and latency, but presumably game developers put a ton of effort into ensuring that the pixels are pushed with as little input latency as possible. This may not have been a priority for Microsoft in their terminal.
The whole Terminal code is open source
https://github.com/microsoft/terminal/blob/main/src/renderer...
The first comment in this function (DxEngine::StartPaint), for example:
What is with the formatting on that file? Is seems like every other line of code has no indentation.
1 reply →
Nice find!
For games, consistent, smooth frame rates and vsync(no tearing) is more important than input lag so often times things will be buffered.
That said, the VR space has a much tighter tolerance on input lag and there's hardware based mitigations. Oculus has a lot of techniques such as "Asynchronous Spacewarp" which will calculate intermediate frames based on head movement(an input) and movement vectors storing the velocity of each pixel. They also have APIs to mark layers as head locked or free motion etc.