When I saw this page a few years back I had an idea for a project. I want to create the lowest-latency typing terminal I possibly can, using an FPGA and an LED array. My initial results suggest that I can drive a 64x32 pixel LED array at 4.88kHz, for a roughly 0.2ms latency.
For the next step I want to make it capable of injecting artificial latency, and then do A/B testing to determine (1) the smallest amount of latency I can reliably perceive, and (2) the smallest amount of latency that actually bothers me.
I remember when I got the iPhone X I was so used to higher latency that it felt like the iPhone was typing before I was. It was a very strange sensation until I got used to how quick it was.
If you ever end up doing this project, I find that sometimes it's hard to quantify if something is better when going in the normal-to-better direction, but it's always much easier to tell when something is worse when going in the normal-to-worse direction. So spend a few days or weeks getting totally acclimated to fast response times and then test if you can notice the difference with slow response times.
I like the idea, but note that (1) and (2) can depend on what you’re used to. The fact that one doesn’t notice a handicap doesn’t mean that there isn’t room for improvement, given some conditioning.
Is there an appreciable practical lower-bound in latency to that? I’ve never understood how-and-why electronic signals can propagate down a wire so gosh-darn quickly: the speed of sound is what I’d have intuitively expected, not 50-99% the speed of light ( https://en.wikipedia.org/wiki/Speed_of_electricity )
An anecdote that will probably sway no one: was in a family friendly barcade and noticed-- inexplicably--a gaggle of kids, all 8-14, gathered around the Pong. Sauntering up so I could overhear their conversation, it was all excited variants of "It's just a square! But it's real!","You're touching it!", or "The knobs really move it."
If you wonder why we no long we have "twitch" games, this is why. Old school games had a tactile aesthetic lost in the blur of modern lag.
Celeste is not really the same sort of thing, and compared to e.g. IWBTG fangames it feels a bit like playing a platformer while covered in glue. IWBTG fangames themselves probably feel similar to people who are used to playing Super Smash Brothers Melee for the Nintendo GameCube because those people play on CRTs which eliminates a couple frames of latency compared to LCDs.
How do you determine at what point the key switch is activated? Or is the travel time from start to fully depressed negligible compared to measured latency?
You wire an LED to a button (like a mouse left click) and with a 1000Hz camera you can count how many frames it takes for the screen to update after the LED lights up. Repeat many times to account for being in varied stages of the refresh cycle.
With that method I would just double check that 240-fps slo-mo video is synced to realtime. It may be applying an unnoticeable slow-motion effect (e.g. 90% speed playback) that would throw off the results
I wonder if a compositor, and possibly an entire compositing system designed around adaptive sync could perform substantially better than current compositors.
Currently, there is a whole pile of steps to update a UI. The input system processes an event, some decision is made as to when to rerender the application, then another decision is made as to when to composite the screen, and hopefully this all finishes before a frame is scanned out, but not too far before, because that would add latency. It’s heuristics all the way down.
With adaptive sync, there is still a heuristic decision as to whether to process an input event immediately or to wait to aggregate more events into the same frame. But once that is done, an application can update its state, redraw itself, and trigger an immediate compositor update. The compositor will render as quickly as possible, but it doesn’t need to worry about missing scanout — scanout can begin as soon as the compositor finishes.
(There are surely some constraints on the intervals between frames sent to the display, but this seems quite manageable while still scanning out a frame immediately after compositing it nearly 100% of the time.)
For fullscreen apps one can do something even better: skip compositing or buffering entirely. Instead cooperate with the GPU and raster directly into the output buffer ahead of the pixels sent to the display. In wayland that's called direct scanout.
Adaptive sync can only delay drawing, never make it happen sooner. This means it can only harm average latency of response to unpredictable events, such as human interaction. (Individual events may have lower latency purely by luck, because latency depends on the position of the raster scan relative to the part of the screen that needs to be updated, and adaptive sync will perturb this, but this effect is just as likely to make things worse.) The lowest average latency is always achieved by running the monitor at maximum speed all the time and responding to events immediately.
Adaptive sync is beneficial for graphically intensive games where you can't always render fast enough, but IMO this should never be true for a GUI on modern hardware.
> Adaptive sync can only delay drawing, never make it happen sooner.
That’s a matter of perspective. If your goal is to crank out frames at exactly 60 Hz (or 120 Hz or whatever), then, sure, you can’t send frames early and you want to avoid being late. But this seems like a somewhat dubiously necessary goal in a continuously rendered game and a completely useless goal in a desktop UI. So instead the goal can be to be slightly late for every single frame, and then if you’re less late than intended, fine.
Alternatively, one could start compositing at the target time. If it takes 0.5ms, then the frame is 0.5ms late. If it goes over and takes 1ms, then the frame is 1ms late.
With text content, most frames are exactly the same. So what adaptive sync can do is delay a refresh until just after the content has been updated. At a minimum, it can delay a refresh when an update is currently being drawn, which would lower the max latency.
We've got servers in 200+ cities around the world, and ask them to ping each other every hour. Currently it takes our servers in Tokyo and London about 226ms to ping each other.
The fundamental physical limit to latency caused by the speed of light is gleefully ignored by many web "application" architects. Apps that feel super snappy when hosted in the same region run like molasses from places like Australia. Unless the back-end is deployed in every major region, a significant fraction of your userbase will always think of your app as sluggish, irrespective of how much optimisation work goes into it.
Some random example:
Azure Application Insights can be deployed to any Azure region, making it feel noticeably snappier than most cloud hosted competitors such as New Relic or logz.io.
ESRI ArcGIS has a cloud version that is "quick and easy" to use compare to the hosted version... and is terribly slow for anyone outside of the US.
Our timesheet app is hosted in the US and is barely useable. Our managers complain that engineers "don't like timesheets". Look... we don't mind timesheets, but having to... wait... seconds... for.... each... click... is just torture, especially at 4:55pm on a Friday afternoon.
As a developer in Australia, we are painfully aware of this haha. Plenty of web services that aren’t deployed here feel painfully slow due to the latency costs — regardless of having decent bandwidth today.
Because our product is global, our backends are replicated worldwide too. Otherwise we’d be forcing the pain we go through daily on our users too
Your point is completely valid but physically there still is some room to improve. Hollow core fibers for instance allow light to move one third faster.
With 40 Mm circumference and 300 Mm/s light speed in vacuum you have as a physical limit latency below 70 ms from opposite places in the world.
Even if you fix where the backend is & use something like Edge workers around the world, you still run into the issue of where the database is hosted. Making all the work useless. Any useful endpoint is going to change some state like the timesheet app.
I've used your ping data before, it was useful to know where to place my servers, and how nice of you to publish a dump as well! If I can wish for mor data: a min-median-max client latencies for all those servers would be swell, but I can see that you might not want to publish the results of that maybe on per month basis? Just a couple of thousand packets every hour should be enough: tcpdump -w stats.pcap -c5000 "tcp[tcpflags] & (tcp-syn|tcp-ack) != 0"
I'm curious if you have a theory or explanation as to why some pings appear to be asymmetric. For example for the following cities, it seems West->East is often faster than East->West:
Chicago London New York
Chicago — 105.73ms 21.273ms
London 108.227ms — 72.925ms
New York 21.598ms 73.282ms —
Seems like those numbers are likely all within a margin of error. If you hover over the times in the table, it also gives you min and max values, which are often +/- 2ms or so.
I recently had some free time and used it to finish fixing up an Amiga 3000 (recapping the motherboard, repairing some battery damage on the motherboard). I installed AmigaDOS 3.2.1 and started doing things with it like running a web browser and visiting modern web sites.
The usability is worlds better than what we have now, even comparing a 1990 computer with a 25 MHz m68030 and 16 megs of memory with a four core, eight thread Core i7 with 16 gigs of memory. Interestingly, the 1990 computer can have a datatype added which allows for webp processing, whereas the Mac laptop running the latest Safari available for it can't do webp.
We've lost something, and even when we're aware of it, that doesn't mean we can get it back.
> UIKit introduced 1-2 ms event processing overhead, CPU-bound
I wonder if this is correct, and what's happening there if so - a modern CPU (even a mobile one) can do a lot in 1-2 ms. That's 6 to 12% of the per-frame budget of a game running at 60 fps, which is pretty mind-boggling for just processing an event.
I guess you can waste any amount of time with "a few" layers of strictly unnecessary indirection.
Speaking of games: I had just the other day the realization that we should look into software design around games if we want proper architectures for GUI applications.
What we do today instead are "layers of madness". At least I would call it like this.
Has anyone else used an IBM mainframe with a hardware 327x terminal?
They process all normal keystrokes locally, and only send back to the host when Enter and function keys are pressed. This means very low latency for typing and most keystrokes. But much longer latency when you press enter, or page up/down as the mainframe then processes all the on-screen changes and sends back the refreshed screen (yes, you are looking at a page at a time, there is no scrolling).
Of course, these days people use emulators instead of hardware terminals so you get the standard GUI delays and the worst of both worlds.
Something I recently observed is that cutting edge, current generation gaming-marketed x86-64 motherboards for single socket CPUs, both Intel and AMD, still come with a single PS/2 mouse port on the rear I/O plate.
I read something about this being intended for use with high end wired gaming mice, where the end to end latency between mouse and cursor movement is theoretically lower if the signal doesn't go through the USB bus on the motherboard, but rather through whatever legacy PS/2 interface is talking to the equivalent-of-northbridge chipset.
Latency is lower because it's interrupt-based and much simpler than the polled USB stack. IMHO if you're going to always have a keyboard and mouse connected to the computer, it makes perfect sense to keep them on the dedicated simpler interface instead of the general USB; especially when the dedicated interface will be more reliable. The industry may be partly moving away from the "legacy-free USB everything" trend that started in the 2000s, finally.
AFAIK all SuperIOs support a pair of PS/2 ports, so from a BoM perspective it's not an extra cost to the manufacturer, but they still market it as a premium feature.
I'd like to see older MS-DOS and Windows on there for comparison; I remember dualbooting 98se and XP for a while in the early 2000s and the former was noticeably more responsive.
Another comparative anecdote I have is between Windows XP and OS X on the same hardware, wherein the latter was less responsive. After seeing what GUI apps on a Mac actually involve, I'm not too surprised: https://news.ycombinator.com/item?id=11638367
Bare C GUI applications with somewhat modern integration are always that 'verbose' when compared to older or framework-based ones.
I do wonder about MS-DOS on different machines, technically it would be the BIOS and VBIOS doing much of the heavy lifting so vendor variations might have a real impact here. Same as a CSM on UEFI; DOS wouldn't know the difference, but traversing a firmware that as as complex (or more complex) from DOS would cause a whole lot of extra latency.
Powershell isn't a terminal (it's a shell, obviously), so the windows results are most likely tested in conhost. If it's on windows 11 it might be windows terminal, which may be more likely since I think cmd is still default on windows 10.
It might still be a valid test, because PowerShell needs to have a bunch of code in the stack between the keypress event and the call into the console API that actually displays the character. Among other things, the entire command line is getting lexically parsed every time you press a key.
If you think "parsing the command line" should or does take appreciable time on a human timescale when executed by a modern superscalar processor, then your mental model of computer performance is "off" by at least 4 or 5 orders of magnitude. Not four or five times incorrect, but many thousands of times incorrect.
Once I got good at typing on it my Acorn Electron (we couldn’t afford the whizzy bbc master!) was an extension of my brain.
Instant response. A full reboot was a control break away. Instant access to the interpreter. Easy assembly access.
I thought, it executed.
I remember our school moving from the networked bbc’s to the PC’s and it was a huge downgrade for us as kids. Computer class became operating a word processor or learning win 3.11 rather than the exciting and sometimes adversarial (remote messaging other terminals, spoofing etc) system that made us want to learn, to just more drudgery.
Having an ordinary key on the keyboard that would effectively kill -9 the current program and clear the screen was a crazy design decision, especially for a machine where saving data meant using a cassette tape!
In general, predicting user input to reduce latency is a great idea and we should do more of it, as long as you have a good system for rolling back mispredictions. Branch prediction is such a fundamental thing for CPUs that it's surprising to me that it doesn't exist at every level of computing. The JavaScript REPL's (V8's REPL) "eager evaluation" where it shows you the result of side-effect free expressions before you execute them is the kind of thing I'm thinking about https://developer.chrome.com/blog/new-in-devtools-68/#eagere...
There is hardware input ( Keyboard and mouse ) latency as well as output like display latency. Unfortunately the market, and industry as a whole doesn't care about latency at all.
While I am not a fan or proponent of AR / VR. One thing that will definitely be an issue is latency. Hopefully there will be enough incentive for companies to look into it.
Isn't this experiment a bit bogus? Extrapolating a terminal emulator's behavior to represent a machine's latency /in general/... what if the terminal emulator just sucks? Dan Luu is of course aware of this but he's willing to swallow it as noise:
> Computer results were taken using the “default” terminal for the system (e.g., powershell on windows, lxterminal on lubuntu), which could easily cause 20 ms to 30 ms difference between a fast terminal and a slow terminal.
If that was the only source of noise in the measurements then ok, maybe, but compounded with other stuff? For example, I was thinking: the more time passes, the further we drift from the command-line being the primary interface through which we interact with our computer. So naturally older computers would take more care in optimizing their terminal emulator to work well, as it's the face of the computer, right? Somebody's anecdote about PowerShell performance in this thread makes me feel more comfortable assuming that maybe modern vendors don't care so much about terminal latency.
Using the "default browser" as the metric for mobile devices worries me even more...
I like Dan Luu and I SupportThismessage™ but I feel funny trying to take anything away from this post...
Wonder about hat as we just talked about the importance of sub-second response time in 1990s (full screen 3270 after hitting enter; even if no ims or db2 how can it be done …). The terminal keyboard response is fine (on 3270). Network (sna) …
On my state-of-the-art desktop PC, Visual Studio has very noticeable cursor&scrolling lag. My C64 had the latter as well, but I used to assume the cursor moved as fast as I could type / tap the arrow keys
I really found this valuable, particularly the slide at the top that enables you to visualize low level latency times (Jeff Dean numbers) over the years. tl;dr: not much has changed in the processor hardware numbers since 2012. So everything right of the processor is where the action is. And sounds like people are starting to actually make progress.
He used the 240 FPS camera on an iphone to measure time from start of key movement to screen update several times and rounded to the nearest 10ms. He also used a 1000 FPS Sony camera to recheck the 40ms and under results.
He does mention that he includes the keyboard latency; other latency test results he found exclude that step.
I find it fascinating that you think no one would bother to read details about a nerdy subject on Hacker News. Why else are we here?
I find it fascinating that a nerd wouldn't open with a description of their methods, which is what nerd readers want to know right up front. It's buried on the last page.
Does this actually even matter today when every click or key-press triggers dozens of fat network request going around the globe on top of a maximally inefficient protocol?
Or to summarize what we see here: We've build layers of madness. Now we have just to deal with the fallout…
The result is in no way surprising given we haven't refactored our systems for over 50 years and just put new things on top.
If you aren't familiar, check out Winning Run [1]. A 3D arcade racing game from 1988, about the best possible with custom hardware at the time. Graphics quality is primitive by modern standards. But make sure to watch the video in 60 fps. If there's any hiccups, it's your device playing the video. Smooth and continuous 60 frames per second rendering, with some tens of millisecond delay to respond to game inputs. It's still very hard to pull that off today, yet it's fundamental to that type of game's overall quality.
WipEout HD on the PS3 managed to get super stable 60FPS at 1080p. It dynamically scales the horizontal rendering resolution for every frame and then scales it to 1920 pixels using hardware. So the resolution might vary a bit but at that framerate and such speeds in races it's not noticeable. The controls were super smooth at any speed, only achievement popups caused the whole game to freeze for half a second.
I guess keyboard latency is also the biggest problem if you play old games in emulators. I feel is often very difficult to play old action games, because you can't hit the buttons exactly enough.
When I saw this page a few years back I had an idea for a project. I want to create the lowest-latency typing terminal I possibly can, using an FPGA and an LED array. My initial results suggest that I can drive a 64x32 pixel LED array at 4.88kHz, for a roughly 0.2ms latency.
For the next step I want to make it capable of injecting artificial latency, and then do A/B testing to determine (1) the smallest amount of latency I can reliably perceive, and (2) the smallest amount of latency that actually bothers me.
This idea was also inspired by this work from Microsoft Research, where they do a similar experiment with touch screens: https://www.youtube.com/watch?v=vOvQCPLkPt4
I remember when I got the iPhone X I was so used to higher latency that it felt like the iPhone was typing before I was. It was a very strange sensation until I got used to how quick it was.
If you ever end up doing this project, I find that sometimes it's hard to quantify if something is better when going in the normal-to-better direction, but it's always much easier to tell when something is worse when going in the normal-to-worse direction. So spend a few days or weeks getting totally acclimated to fast response times and then test if you can notice the difference with slow response times.
I like the idea, but note that (1) and (2) can depend on what you’re used to. The fact that one doesn’t notice a handicap doesn’t mean that there isn’t room for improvement, given some conditioning.
I believe there are specific methods to negate these effects (something like going in a specific order with the values)
If would probably be interesting to randomize the latency for certain intervals with some kind of feedback mechanism to provide a blind study.
Sounds like a fun project to do, I wonder if you could even implement it in full discrete logic and skip the FPGA
Is there an appreciable practical lower-bound in latency to that? I’ve never understood how-and-why electronic signals can propagate down a wire so gosh-darn quickly: the speed of sound is what I’d have intuitively expected, not 50-99% the speed of light ( https://en.wikipedia.org/wiki/Speed_of_electricity )
1 reply →
An anecdote that will probably sway no one: was in a family friendly barcade and noticed-- inexplicably--a gaggle of kids, all 8-14, gathered around the Pong. Sauntering up so I could overhear their conversation, it was all excited variants of "It's just a square! But it's real!","You're touching it!", or "The knobs really move it."
If you wonder why we no long we have "twitch" games, this is why. Old school games had a tactile aesthetic lost in the blur of modern lag.
We still have twitch games. Celeste was released only a couple years ago
Celeste is not really the same sort of thing, and compared to e.g. IWBTG fangames it feels a bit like playing a platformer while covered in glue. IWBTG fangames themselves probably feel similar to people who are used to playing Super Smash Brothers Melee for the Nintendo GameCube because those people play on CRTs which eliminates a couple frames of latency compared to LCDs.
1 reply →
Classic example of two steps forward, one step backwards. Though there's schools which are exceptions to the norm.
FWIW, a quick ballpark test shows <30 ms minimum keyboard latency on my M1 Max MacBook, which has a 120-hz display.
Method: Record 240-fps slo-mo video. Press keyboard key. Count frames from key depress to first update on screen, inclusive. Repeat 3x for each app.
How do you determine at what point the key switch is activated? Or is the travel time from start to fully depressed negligible compared to measured latency?
You wire an LED to a button (like a mouse left click) and with a 1000Hz camera you can count how many frames it takes for the screen to update after the LED lights up. Repeat many times to account for being in varied stages of the refresh cycle.
Well, that's how it was done 10 years ago.
There is a good app to help with this, "is it snappy".
With that method I would just double check that 240-fps slo-mo video is synced to realtime. It may be applying an unnoticeable slow-motion effect (e.g. 90% speed playback) that would throw off the results
So e.g. put a clock in the video
I wonder if a compositor, and possibly an entire compositing system designed around adaptive sync could perform substantially better than current compositors.
Currently, there is a whole pile of steps to update a UI. The input system processes an event, some decision is made as to when to rerender the application, then another decision is made as to when to composite the screen, and hopefully this all finishes before a frame is scanned out, but not too far before, because that would add latency. It’s heuristics all the way down.
With adaptive sync, there is still a heuristic decision as to whether to process an input event immediately or to wait to aggregate more events into the same frame. But once that is done, an application can update its state, redraw itself, and trigger an immediate compositor update. The compositor will render as quickly as possible, but it doesn’t need to worry about missing scanout — scanout can begin as soon as the compositor finishes.
(There are surely some constraints on the intervals between frames sent to the display, but this seems quite manageable while still scanning out a frame immediately after compositing it nearly 100% of the time.)
For fullscreen apps one can do something even better: skip compositing or buffering entirely. Instead cooperate with the GPU and raster directly into the output buffer ahead of the pixels sent to the display. In wayland that's called direct scanout.
But yeah, for non-fullscreen it helps. See https://github.com/swaywm/sway/pull/5063
Uhm... aren't you basically describing wayland?
This Xorg dude did exactly the tuning you want on wayland https://artemis.sh/2022/09/18/wayland-from-an-x-apologist.ht...
You mean the max_render_time? That’s exactly the kind of kludge I’m suggesting that adaptive sync can eliminate.
Adaptive sync can only delay drawing, never make it happen sooner. This means it can only harm average latency of response to unpredictable events, such as human interaction. (Individual events may have lower latency purely by luck, because latency depends on the position of the raster scan relative to the part of the screen that needs to be updated, and adaptive sync will perturb this, but this effect is just as likely to make things worse.) The lowest average latency is always achieved by running the monitor at maximum speed all the time and responding to events immediately.
Adaptive sync is beneficial for graphically intensive games where you can't always render fast enough, but IMO this should never be true for a GUI on modern hardware.
> Adaptive sync can only delay drawing, never make it happen sooner.
That’s a matter of perspective. If your goal is to crank out frames at exactly 60 Hz (or 120 Hz or whatever), then, sure, you can’t send frames early and you want to avoid being late. But this seems like a somewhat dubiously necessary goal in a continuously rendered game and a completely useless goal in a desktop UI. So instead the goal can be to be slightly late for every single frame, and then if you’re less late than intended, fine.
Alternatively, one could start compositing at the target time. If it takes 0.5ms, then the frame is 0.5ms late. If it goes over and takes 1ms, then the frame is 1ms late.
With text content, most frames are exactly the same. So what adaptive sync can do is delay a refresh until just after the content has been updated. At a minimum, it can delay a refresh when an update is currently being drawn, which would lower the max latency.
2 replies →
Global Ping Data - https://wondernetwork.com/pings
We've got servers in 200+ cities around the world, and ask them to ping each other every hour. Currently it takes our servers in Tokyo and London about 226ms to ping each other.
We've got some downloadable datasets here if you want to play with them: https://wonderproxy.com/blog/a-day-in-the-life-of-the-intern...
The fundamental physical limit to latency caused by the speed of light is gleefully ignored by many web "application" architects. Apps that feel super snappy when hosted in the same region run like molasses from places like Australia. Unless the back-end is deployed in every major region, a significant fraction of your userbase will always think of your app as sluggish, irrespective of how much optimisation work goes into it.
Some random example:
Azure Application Insights can be deployed to any Azure region, making it feel noticeably snappier than most cloud hosted competitors such as New Relic or logz.io.
ESRI ArcGIS has a cloud version that is "quick and easy" to use compare to the hosted version... and is terribly slow for anyone outside of the US.
Our timesheet app is hosted in the US and is barely useable. Our managers complain that engineers "don't like timesheets". Look... we don't mind timesheets, but having to... wait... seconds... for.... each... click... is just torture, especially at 4:55pm on a Friday afternoon.
As a developer in Australia, we are painfully aware of this haha. Plenty of web services that aren’t deployed here feel painfully slow due to the latency costs — regardless of having decent bandwidth today.
Because our product is global, our backends are replicated worldwide too. Otherwise we’d be forcing the pain we go through daily on our users too
1 reply →
Your point is completely valid but physically there still is some room to improve. Hollow core fibers for instance allow light to move one third faster.
With 40 Mm circumference and 300 Mm/s light speed in vacuum you have as a physical limit latency below 70 ms from opposite places in the world.
2 replies →
Even if you fix where the backend is & use something like Edge workers around the world, you still run into the issue of where the database is hosted. Making all the work useless. Any useful endpoint is going to change some state like the timesheet app.
I've used your ping data before, it was useful to know where to place my servers, and how nice of you to publish a dump as well! If I can wish for mor data: a min-median-max client latencies for all those servers would be swell, but I can see that you might not want to publish the results of that maybe on per month basis? Just a couple of thousand packets every hour should be enough: tcpdump -w stats.pcap -c5000 "tcp[tcpflags] & (tcp-syn|tcp-ack) != 0"
I'm curious if you have a theory or explanation as to why some pings appear to be asymmetric. For example for the following cities, it seems West->East is often faster than East->West:
Seems like those numbers are likely all within a margin of error. If you hover over the times in the table, it also gives you min and max values, which are often +/- 2ms or so.
*Adds to mental list of cool resources that exist on the internet*
Looking at the map with the blue dots, a cool rainy-day project would be to show the pings and pongs flying back and forth :3
I recently had some free time and used it to finish fixing up an Amiga 3000 (recapping the motherboard, repairing some battery damage on the motherboard). I installed AmigaDOS 3.2.1 and started doing things with it like running a web browser and visiting modern web sites.
The usability is worlds better than what we have now, even comparing a 1990 computer with a 25 MHz m68030 and 16 megs of memory with a four core, eight thread Core i7 with 16 gigs of memory. Interestingly, the 1990 computer can have a datatype added which allows for webp processing, whereas the Mac laptop running the latest Safari available for it can't do webp.
We've lost something, and even when we're aware of it, that doesn't mean we can get it back.
Previous discussions:
https://news.ycombinator.com/item?id=16001407 (December 24, 2017 — 588 points, 161 comments)
This could use a (2017) at end of title (no, its not obvious; that'd be based on assumption).
Going through the list of what happens on iOS:
> UIKit introduced 1-2 ms event processing overhead, CPU-bound
I wonder if this is correct, and what's happening there if so - a modern CPU (even a mobile one) can do a lot in 1-2 ms. That's 6 to 12% of the per-frame budget of a game running at 60 fps, which is pretty mind-boggling for just processing an event.
I guess you can waste any amount of time with "a few" layers of strictly unnecessary indirection.
Speaking of games: I had just the other day the realization that we should look into software design around games if we want proper architectures for GUI applications.
What we do today instead are "layers of madness". At least I would call it like this.
Games have privilege of controlling everything from input device to GPU pipeline. Nothing desktop is going to be that vertically integrated easily
3 replies →
Has anyone else used an IBM mainframe with a hardware 327x terminal?
They process all normal keystrokes locally, and only send back to the host when Enter and function keys are pressed. This means very low latency for typing and most keystrokes. But much longer latency when you press enter, or page up/down as the mainframe then processes all the on-screen changes and sends back the refreshed screen (yes, you are looking at a page at a time, there is no scrolling).
Of course, these days people use emulators instead of hardware terminals so you get the standard GUI delays and the worst of both worlds.
Using emacs on an SGI Iris in 1988 was … sublime.
Every computer systems since then has been a head shaking disappointment, latency-wise.
Something I recently observed is that cutting edge, current generation gaming-marketed x86-64 motherboards for single socket CPUs, both Intel and AMD, still come with a single PS/2 mouse port on the rear I/O plate.
I read something about this being intended for use with high end wired gaming mice, where the end to end latency between mouse and cursor movement is theoretically lower if the signal doesn't go through the USB bus on the motherboard, but rather through whatever legacy PS/2 interface is talking to the equivalent-of-northbridge chipset.
Some still have two, for a keyboard and mouse:
https://static.tweaktown.com/content/1/0/10071_10_asus-rog-m...
Latency is lower because it's interrupt-based and much simpler than the polled USB stack. IMHO if you're going to always have a keyboard and mouse connected to the computer, it makes perfect sense to keep them on the dedicated simpler interface instead of the general USB; especially when the dedicated interface will be more reliable. The industry may be partly moving away from the "legacy-free USB everything" trend that started in the 2000s, finally.
AFAIK all SuperIOs support a pair of PS/2 ports, so from a BoM perspective it's not an extra cost to the manufacturer, but they still market it as a premium feature.
I'd like to see older MS-DOS and Windows on there for comparison; I remember dualbooting 98se and XP for a while in the early 2000s and the former was noticeably more responsive.
Another comparative anecdote I have is between Windows XP and OS X on the same hardware, wherein the latter was less responsive. After seeing what GUI apps on a Mac actually involve, I'm not too surprised: https://news.ycombinator.com/item?id=11638367
Bare C GUI applications with somewhat modern integration are always that 'verbose' when compared to older or framework-based ones.
I do wonder about MS-DOS on different machines, technically it would be the BIOS and VBIOS doing much of the heavy lifting so vendor variations might have a real impact here. Same as a CSM on UEFI; DOS wouldn't know the difference, but traversing a firmware that as as complex (or more complex) from DOS would cause a whole lot of extra latency.
Powershell isn't a terminal (it's a shell, obviously), so the windows results are most likely tested in conhost. If it's on windows 11 it might be windows terminal, which may be more likely since I think cmd is still default on windows 10.
It might still be a valid test, because PowerShell needs to have a bunch of code in the stack between the keypress event and the call into the console API that actually displays the character. Among other things, the entire command line is getting lexically parsed every time you press a key.
If you think "parsing the command line" should or does take appreciable time on a human timescale when executed by a modern superscalar processor, then your mental model of computer performance is "off" by at least 4 or 5 orders of magnitude. Not four or five times incorrect, but many thousands of times incorrect.
2 replies →
The test is valid for any combo of shell and terminal, it's just a matter of figuring out which methodology was used so it can be better understood.
But yeah, I agree with the other comment that powershell is likely adding less than 1ms.
1 reply →
I always thought that Apple ][ + was as good as it gets. It's been downhill from there, for Apple and for the rest of us.
Once I got good at typing on it my Acorn Electron (we couldn’t afford the whizzy bbc master!) was an extension of my brain.
Instant response. A full reboot was a control break away. Instant access to the interpreter. Easy assembly access.
I thought, it executed.
I remember our school moving from the networked bbc’s to the PC’s and it was a huge downgrade for us as kids. Computer class became operating a word processor or learning win 3.11 rather than the exciting and sometimes adversarial (remote messaging other terminals, spoofing etc) system that made us want to learn, to just more drudgery.
I agree with all of this except for one point:
Having an ordinary key on the keyboard that would effectively kill -9 the current program and clear the screen was a crazy design decision, especially for a machine where saving data meant using a cassette tape!
3 replies →
I was shocked to see the TI-99/4a so high up. Just listing a BASIC program on a TI-99 is about as slow as a 300 baud modem.
Example: https://youtu.be/ls-PxqRQ35Q?t=178
iPads predict user input https://developer.apple.com/documentation/uikit/touches_pres... . Did they do this back when this article was written or is this a newer thing that lets them get to even lower user perceived latencies than 30ms?
In general, predicting user input to reduce latency is a great idea and we should do more of it, as long as you have a good system for rolling back mispredictions. Branch prediction is such a fundamental thing for CPUs that it's surprising to me that it doesn't exist at every level of computing. The JavaScript REPL's (V8's REPL) "eager evaluation" where it shows you the result of side-effect free expressions before you execute them is the kind of thing I'm thinking about https://developer.chrome.com/blog/new-in-devtools-68/#eagere...
Yet more proof we should have just stopped with the SGI Indy
There is hardware input ( Keyboard and mouse ) latency as well as output like display latency. Unfortunately the market, and industry as a whole doesn't care about latency at all.
While I am not a fan or proponent of AR / VR. One thing that will definitely be an issue is latency. Hopefully there will be enough incentive for companies to look into it.
Isn't this experiment a bit bogus? Extrapolating a terminal emulator's behavior to represent a machine's latency /in general/... what if the terminal emulator just sucks? Dan Luu is of course aware of this but he's willing to swallow it as noise:
> Computer results were taken using the “default” terminal for the system (e.g., powershell on windows, lxterminal on lubuntu), which could easily cause 20 ms to 30 ms difference between a fast terminal and a slow terminal.
If that was the only source of noise in the measurements then ok, maybe, but compounded with other stuff? For example, I was thinking: the more time passes, the further we drift from the command-line being the primary interface through which we interact with our computer. So naturally older computers would take more care in optimizing their terminal emulator to work well, as it's the face of the computer, right? Somebody's anecdote about PowerShell performance in this thread makes me feel more comfortable assuming that maybe modern vendors don't care so much about terminal latency.
Using the "default browser" as the metric for mobile devices worries me even more...
I like Dan Luu and I SupportThismessage™ but I feel funny trying to take anything away from this post...
Should it be personal computer latency.
Wonder about hat as we just talked about the importance of sub-second response time in 1990s (full screen 3270 after hitting enter; even if no ims or db2 how can it be done …). The terminal keyboard response is fine (on 3270). Network (sna) …
1977 still have mainframe and workstation.
On my state-of-the-art desktop PC, Visual Studio has very noticeable cursor&scrolling lag. My C64 had the latter as well, but I used to assume the cursor moved as fast as I could type / tap the arrow keys
I really found this valuable, particularly the slide at the top that enables you to visualize low level latency times (Jeff Dean numbers) over the years. tl;dr: not much has changed in the processor hardware numbers since 2012. So everything right of the processor is where the action is. And sounds like people are starting to actually make progress.
https://colin-scott.github.io/personal_website/research/inte...
I didn’t quite catch why we have 2.5 frames of latency and not just up to one frame of latency.
So much added sluggishness and they still cannot bring themselves to show us a current dynamic keyboard mapping to this day.
What's a current dynamic keyboard mapping?
Things (a dialog/popup box?) that let you see what each key are mapped to based on its current window and/or its mouse position.
I wonder how this was all measured.
I didn't dig into the text blob to ferret that out.
Did anybody?
Because this doesn't pass the sniff test for data I want to trust
He used the 240 FPS camera on an iphone to measure time from start of key movement to screen update several times and rounded to the nearest 10ms. He also used a 1000 FPS Sony camera to recheck the 40ms and under results.
He does mention that he includes the keyboard latency; other latency test results he found exclude that step.
I find it fascinating that you think no one would bother to read details about a nerdy subject on Hacker News. Why else are we here?
I find it fascinating that a nerd wouldn't open with a description of their methods, which is what nerd readers want to know right up front. It's buried on the last page.
That is because latency on it's own is an often useless metric.
Cynic comment ahead, beware!
---
Does this actually even matter today when every click or key-press triggers dozens of fat network request going around the globe on top of a maximally inefficient protocol?
Or to summarize what we see here: We've build layers of madness. Now we have just to deal with the fallout…
The result is in no way surprising given we haven't refactored our systems for over 50 years and just put new things on top.
If you aren't familiar, check out Winning Run [1]. A 3D arcade racing game from 1988, about the best possible with custom hardware at the time. Graphics quality is primitive by modern standards. But make sure to watch the video in 60 fps. If there's any hiccups, it's your device playing the video. Smooth and continuous 60 frames per second rendering, with some tens of millisecond delay to respond to game inputs. It's still very hard to pull that off today, yet it's fundamental to that type of game's overall quality.
[1] https://youtu.be/NBiD-v-YGIA?t=85
WipEout HD on the PS3 managed to get super stable 60FPS at 1080p. It dynamically scales the horizontal rendering resolution for every frame and then scales it to 1920 pixels using hardware. So the resolution might vary a bit but at that framerate and such speeds in races it's not noticeable. The controls were super smooth at any speed, only achievement popups caused the whole game to freeze for half a second.
1 reply →
I guess keyboard latency is also the biggest problem if you play old games in emulators. I feel is often very difficult to play old action games, because you can't hit the buttons exactly enough.
2 replies →
This video is not a steady 60FPS. Lots of frames are duplicated or torn. Maybe this was originally 60FPS and got mangled by the recording process.
1 reply →
That inefficient network has better latency than your computer when trying to show you a pixel: <http://newstmobilephone.blogspot.com/2012/05/john-carmack-ex...>
Only that such a network call can't replace the pixel output.
It just adds up to the overall latency.
Also real latency of web-pages is measured in seconds these days. People are happy when they're able to serve a request in under 0.2 sec.
6 replies →