Comment by irskep

2 days ago

I agree with most of the other comments here, and it sounds like Shopify made sound tradeoffs for their business. I'm sure the people who use Shopify's apps are able to accomplish the tasks they need to.

But as a user of computers and occasional native mobile app developer, hearing "<500ms screen load times" stated as a win is very disappointing. Having your app burn battery for half a second doing absolutely nothing is bad UX. That kind of latency does have a meaningful effect on productivity for a heavy user.

Besides that, having done a serious evaluation of whether to migrate a pair of native apps supported by multi-person engineering teams to RN, I think this is a very level-headed take on how to make such a migration work in practice. If you're going to take this path, this is the way to do it. I just hope that people choose targets closer to 100ms.

I would read the <500ms screen loads as follows:

When the user clicks a button, we start a server round-trip and fetch the data and do client-side parsing, layout, formatting and rendering and then less than 500ms later, the user can see the result on his/her screen.

With a worst-case ping of 200ms for a round-trip, that leaves about 200ms for DB queries and then 100ms for the GUI rendering, which is roughly what you'd expect.

  • Since the post is about the benefits of react, I'm sure if requests were involved they would mention it.

    Also, even if it was involved, 200ms for round-trip and DB queries is complete bonkers. Most round-trips don't take more than 100ms, and if you're taking 200ms for a DB query on an app with millions of users, you're screwed. Most queries should take max 20-30ms, with some outliers in places where optimization is hard taking up to 80ms.

    • > Most queries should take max 20-30ms

      Most queries are 20-30ms. But a worst case of 200ms for large payloads or edge cases or just general degradations isn't crazy. Without knowing if 500ms is a p50 or p99 it's kind of a meaningless metric but assuming it's a p99, I think it's not as bad as the original commenter stated.

      4 replies →

    • I have a 160ms ping to news.ycombinator.com. Loading your comment took 1.427s of wall clock time. <s>Clearly, HN is so bad, it's complete bonkers ;)</s>

      time curl -o tmp.del "https://news.ycombinator.com/item?id=42730748"

      real 0m1.427s

      "if you're taking 200ms for a DB query on an app with millions of users, you're screwed"

      My calculation was 200ms for the DB queries and the time it takes your server-side framework ORM system to parse the results and transform it into JSON. But even in general, I disagree. For high-throughput systems it typically makes sense to make the servers stateless (which adds additional DB queries) in exchange for the ability to just start 20 servers in parallel. And especially for PostgreSql index scans where all the IO is cached in RAM anyway, single-core CPU performance quickly becomes a bottleneck. But a 100+ core EPYC machine can still reach 1000+ TPS for index scans that take 100ms each. And, BTW, the basic Shopify plan only allows 1 visitor per 17 seconds to your shop. That means a single EPYC server could still host 17,000 customers on the basic plan even if each visit causes 100ms of DB queries.

      2 replies →

    • I do not understand this thinking at all, a parsed response into whatever rendering engine, even if extremely fast is going to be a large percentage of this 500ms page load. Diminishing it with magical thinking about pure database queries under load with no understanding of the complexity of Shopify is quite frankly ridiculous, next up you’ll be telling everyone to roll there own file sharing with rsync or something…

      10 replies →

  • If you are good those numbers are an order of magnitude off. In truth it is probably mostly auth or something. If you simply avoid json you can radically attack these things fast.

    RTT to nearest major metro DC should be up to 20ms (where I am it is less than half that), your DB calls should not be anything like 200ms (and in the event they are you need to show something else first), and 10-20ms is what you should assume for rendering budget of something very big. 60hz means 16ms per frame after all.

    • What percentile? Topics like these don't talk about the 5G connected iphone 16 pro max, but have to include low-end phones with old OS versions and bad connectivity (e.g. try the same network connectivity in the London metro, where often there is no receiption whatsoever).

      As you reach for higher percentiles, RTT and such start growing very fast.

      Edit: other commenter mentioned 75% as percentile.

      4 replies →

    • > RTT to nearest major metro DC should be up to 20ms (where I am it is less than half that)

      over a mobile network? My best rtt to azure or aws over tmobile or verizon is 113ms vs 13ms over my fiber conection.

      7 replies →

  • > 200ms for DB queries

    No. Just no. There’s an entire generation of devs at this point who are convinced that a DB is something you throw JSON into, use UUIDs for everything, add indices when things are slower than you expected, and then upsize the DB when that doesn’t fix it.

    RAM access on modern hardware has a latency of something like 10 nanoseconds. NVMe reads vary based on queue depth and block size, but sub-msec is easily attainable. Even if your disks are actually a SAN, you should still see 1-2 msec. The rest is up to the DB.

    All that to say, a small point query on a well-designed schema should easily execute in sub-msec times if the pages are in the DB’s buffer pool. Even one with a small number of joins shouldn’t take more than 1-2 msec. If this is not the case for you, your schema, query, or DB parameters are sub-optimal, or you’re doing some kind of large aggregation query.

    I took a query from 70 to 40 msec today just by rewriting it. Zero additional indexing or tuning, just unrolling several unnecessary nested subqueries, and adding a more selective predicate. I have no doubt that it could get into the single digits if better indexing was applied.

    I beg of devs, please take the time to learn SQL, to read EXPLAIN plans, and to measure performance. Don’t accept 200 msec queries as “good enough” because you’re meeting your SLOs. They can be so much faster.

    • Beg all you want. They're still going to dump JSON strings (not even jsonb) and UUIDs in them anyway, because, "Move fast and break things."

      I lament along with you.

      1 reply →

    • >RAM access on modern hardware has a latency of something like 10 nanoseconds

      What modern hardware are you using that this is true? That's faster than L3 cache on many processors.

      1 reply →

    • "All that to say, a small point query on a well-designed schema should easily execute in sub-msec times if the pages are in the DB’s buffer pool"

      Shopify is hosting a large number of webshops with billions of product descriptions, but each store only has a low visitor count. So we are talking about a very large and, hence, uncacheable dataset with sparse access. That means almost every DB query to fetch a product description will hit the disk. I'd even assume a RAID of spinning HDDs for price reasons.

      1 reply →

    • > just unrolling several unnecessary nested subqueries, and adding a more selective predicate

      And state of the art query optimizers can even do all this automatically!

      2 replies →

    • I think 500ms P75 is good for an app that hits network in a hot path (mobile networks are no joke), but I agree that 200ms is very very bad for hitting the DB on the backend. I've managed apps with tables in the many, many billions of rows in MySQL and would typically expect single digit millisecond responses. If you use EXPLAIN you can quickly learn to index appropriately and adjust queries when necessary.

      1 reply →

  • The 500ms number is p75 - not worst case at all.

    200ms round trip is like 10x more than what's reasonably possible.

    Same with your other numbers.

  • People have gotten used to that, but UI work back to the 1960s has done studies and showed clearly that for many of these operations you get tens of ms before people notice and their attention wanes. The web often doesn't allow for response times as fast as the humans need, which is a good reason to write real apps not web apps. That is also why I use tabs - load a bunch in the background so when I'm ready I can just switch tabs and it is there.

500ms is the 75th percentile speed, so 75% of users are having load times faster than that. For context, Google's synthetic p75 loads emulate a crappy old Android phone on a bad network.

A linked post[0] says their p75 was 1400ms before 2023, yowza.

[0] https://shopify.engineering/improving-shopify-app-s-performa...

  • > so 75% of users are having load times faster

    No. It on a request basis, meaning that one in a four clicks a user does take more than half a second to complete. Slow times for as low percentiles as 75 mean users hit the bad cases very often in practice.

  • 2 seconds to wait for a webpage to load isn’t even that bad. If you take an average user on facebook it is horrendously slow - to someone who knows how fast something can be - but no typical user cares/notices. They just accept it.

    Nike’s website is phenomenally quick. But again. Ask anyone if that is what they care about. Nope. It’s the shoes.

> Having your app burn battery for half a second doing absolutely nothing is bad UX. That kind of latency does have a meaningful effect on productivity for a heavy user.

The implication is that React Native is to blame for this and I'm not sure that's true. What would the ms delay be with pure native? I have plenty of native apps that also have delays from time to time.

  • It all depends on whether the number includes network roundtrip or not, which they don't state. I read it as not including a network request, i.e. all CPU and local I/O.

    • The article they link to about how they optimized talks about caching network calls as part of their strategy to get below 500ms, so I would assume network calls are included in the number.

Replying to myself for clarification: I did not read their 500ms number as including waiting for a network. It sounded like that's how long it was taking React Native to load local data and draw the screen. If that's not the case, it's a very different story.

From another comment by seemack (https://news.ycombinator.com/item?id=42730348):

> For example, I just recorded myself tapping on a product in the Product list screen and the delay between the pressed state appearing and the first frame of the screen transition animation is more than half a second. The animation itself then takes 300ms which is a generally accepted timeframe for screen animations. But that half second where I'm waiting for the app to respond after I've tapped a given element is painful.

  • author here - the stated screen load time includes server round-trip, parsing, layout, formatting, and rendering.

    • In that case, I apologize for misunderstanding, and would edit my original comment if I could.

Indeed. The games industry uses immediate mode GUIs and people get upset if they achieve less than 60fps. Having everything be this slow is just a huge failure of coordination on behalf of the industry.

(next mini question: why is it seemingly impossible to make an Android app smaller than 60mb? I'm sure it is possible, but almost all the ones I have from the app store are that size)

  • Can't speak for every app but I've worked on several through the years, a sizeable chunk of all the apps I've worked on were assets. It's possible to hide a lot of it from the app store size if you really wanted to but you'd end up downloading all the assets at some point anyway so there's really no point in putting the extra engineering effort in just to make your app store number look smaller.

    This obviously isn't the case for every app and most of the ones I've worked on had a lot of bloat/crap in them as well.

Make a single example of an app that from when I click to the opening takes less than that.

I've just tried whatsapp, notes, gallery, settings and discord out of curiosity, none did and I have a very fast phone.

  • It sounds like you're referring to app-launch time, which is different from screen-load time. Very different things!

Check out Avalonia [0]

It's a cross-platform spiritual successor of WPF and it kicks ass! You get proper separation of models and views, you can separate what controls there are from how they look (themes/styles), you can build the entire thing into a native compiled application with very reasonable speed and memory use.

[0] https://avaloniaui.net

> I just hope that people choose targets closer to 100ms.

Why? If it's about the phone burning battery for 500ms, it probably isn't doing that - it's just waiting for data to arrive. And even when it's rendering, it's probably not burning battery like say Uber (with which you can feel the battery melt in your hands).

But that's not why I am commenting. I am writing because so many commentors are saying that 500ms is bad. Why is 500ms bad, as long as the UI is not freezing or blanking out?

Why not lower expectations, and wait for half a second? Of course, there are apps for which 500ms is unacceptable - but this doesn't seem to be one of them.

Subjectively, I find the Shop app to be quite nice and speedy. It works well enough that I’d never have guessed it is using any kind of cross platform framework.

It’s easy to get caught up on numbers, but at the end of the day the user experience is all that matters. And I very much doubt that performance is a concern for their users.

  • Exactly. Tech people almost always go to the "performance wormhole" arguing about ms and how it could be improved 10x - myself included. But working at a startup the past couple of years, I came to the conclusion that it does not matter to the end users at all. If an app is "nice" and "speedy" as you say, that’s enough. Shopify made a good decision and tradeoffs; it works for them, and I would argue it would work for 90% of other companies as well. You don't really need a native app for most purposes; React Native and Flutter are good enough.

Assuming the 500ms is mostly delay for fetching data over a socket, unless the code is really broken that should not really be burning battery. <500ms for display of non-trivial network-fetched data is great regardless of whether it's rendered by react native or is a fully native app. They would both be I/O-bound on the network primarily, with a small but insignificant compute overhead for RN. If the data needs lots of transformation (though not compute-intensive transformation like calculating hashes or somethign) upon returning that could make a difference, though again I'd be surprised if CPU for RN vs native was all that different.

As an Elixir dev who aims for and routinely achieves <10ms response times, (and sometimes < 1 ms for frequent endpoints that I can hand optimize into a single efficient SQL query, which Ecto makes easy I might add!) I find the response time to be the more egregious part :-D

> Having your app burn battery for half a second doing absolutely nothing is bad UX.

Why are you assuming the app is either burning much battery or even doing more than waiting on current data from the server? For an app that I would assume isn't much use without up-to-date data from the server?