Comment by mdoms

5 years ago

I love the new Windows Terminal (wt.exe) but this is one area where they really messed up. The latency between a keypress and a character appearing on screen makes the whole thing feel like a cheap experience. I have no idea what's causing the latency.

I suspect there some very big fancy rendering pipeline occurring, because when I open Windows Terminal I get the nVidia overlay popup which normally only comes up when I launch games, indicating the terminal is using a GPU-based rendering engine. Which I'm sure confers some interesting benefits, but it's a heavy price to pay when, at the end of the day, it's just a terminal.

I believe they do made a post about how conhost(the actual backend when you run cmd.exe from start) have such lower input latency compared to other windows terminals.

The cmd need to do a lot of things (font rendering, layout whatever) without help of other services (It need to work even without them, see the recovery mode). So it effectively bypass a lot of interactions with other services a normal program need to do in order to put the text on screen. It is basically the tty? in Windows.

But that also means it need to sacrifice a lot of things because that aren't available in recovery mode. You get no fancy emoji supports or whateve feature you would expect in a modern terminal. And the text rendering looks very bad until recently the remake the conhost.

Aren't games normally known for extremely low latencies?

  • Modern game engines buffer 3 or 4 frames, sometimes 5. Not unusual to have 140ms latency on 60hz screen between clicking mouse1 and seeing the muzzle flash.

      * deferred vs forward rendering (deferred adds latency)
      * multithreaded vs singlethreaded
      * vsync (double buffering)
    

    https://www.youtube.com/watch?v=8uYMPszn4Z8 -- check at 6:30 the latency of 60fps vsync on 60hz. It's not even close to 16ms (1/60), it's ~118ms (7.1/60).

    It's far cry from simplified pure math people think of when they think of fps in games or refresh rate for office and typing. Software is very very lazy lately, and most of the time these issues are being fixed by throwing more hardware at it, not fixing the code.

    • > Software is very very lazy lately, and most of the time these issues are being fixed by throwing more hardware at it, not fixing the code.

      Some things cannot be 'fixed'. It's always a trade-off. You can't expect to have all the fancy effects that rely on multiple frames and also low latency.

      If there was a simple software fix, GPU manufacturers would be all over it and pushing it to all engines. It's in their interests to have the lowest latency possible to attract the more hard-core gamers (which then influence others).

      Just look at all the industry cooperation that had to happen to implement adaptive sync. That goes all the way from game developers, engines, GPUs, monitors. Sure that sells more hardware(which brings other benefits), but a software-only approach would also allow companies to sell hardware, by virtue of their "optimized" drivers.

    • > * deferred vs forward rendering (deferred adds latency)

      Wah? Deferred just refers to a screen space shading technique but it still happens once every frame.

      > * multithreaded vs singlethreaded

      Not sure what you're saying here.

      And then of course, yes display buffering does have an impact.

  • Nope. Games usually opt for deeper pipelining to help keep framerates higher if they are making any choice at all. They usually just run at whatever rate they run at, and don't really do "latency tuning." Which is where products like AMD's Anti-Lag ( https://www.amd.com/en/technologies/radeon-software-anti-lag ) and Nvidia's Reflex ( https://www.nvidia.com/en-us/geforce/news/reflex-low-latency... ) enter the picture to just give games a library to help with latency instead.

  • Games that are aiming more for a "cinematic narrative experience" might be perfectly fine with a few 33ms frames of latency, and a total input latency far exceeding 100ms. Competitive twitchy games will tend to be more aggressive. And VR games too, of course.

    In principle, you can push GPU pipelines to very low latencies. Continually uploading input and other state asynchronously and rendering from the most recent snapshot (with some interpolation or extrapolation as needed for smoothing out temporal jitter) can get you down to total application-induced latencies below 10ms. Even less with architectures that decouple shading and projection.

    Doing this requires leaving the traditional 'CPU figures out what needs to be drawn and submits a bunch of draw calls' model, though. The GPU needs to have everything it needs to determine what to draw on its own. If using the usual graphics pipeline, that would mean all frustum/occlusion culling and draw command generation happens on the GPU, and the CPU simply submits indirect calls that tell the GPU "go draw whatever is in this other buffer that you put together".

    This is something I'm working on at the moment, and the one downside is that other games that don't try to clamp down on latency now cause a subtle but continuous mild frustration.

  • Yeah, I'm far from an expert on rendering and latency, but presumably game developers put a ton of effort into ensuring that the pixels are pushed with as little input latency as possible. This may not have been a priority for Microsoft in their terminal.

    • The whole Terminal code is open source

      https://github.com/microsoft/terminal/blob/main/src/renderer...

      The first comment in this function (DxEngine::StartPaint), for example:

          // If retro terminal effects are on, we must invalidate everything for them to draw correctly.
          // Yes, this will further impact the performance of retro terminal effects.
          // But we're talking about running the entire display pipeline through a shader for
          // cosmetic effect, so performance isn't likely the top concern with this feature.

      3 replies →

  • For games, consistent, smooth frame rates and vsync(no tearing) is more important than input lag so often times things will be buffered.

    That said, the VR space has a much tighter tolerance on input lag and there's hardware based mitigations. Oculus has a lot of techniques such as "Asynchronous Spacewarp" which will calculate intermediate frames based on head movement(an input) and movement vectors storing the velocity of each pixel. They also have APIs to mark layers as head locked or free motion etc.

I have yet to find a well behaved GPU accelerated VTE.