Comment by leeoniya

18 days ago

uPlot maintainer here. this looks interesting, i'll do a deeper dive soon :)

some notes from a very brief look at the 1M demo:

- sampling has a risk of eliminating important peaks, uPlot does not do it, so for apples-to-apples perf comparison you have to turn that off. see https://github.com/leeoniya/uPlot/pull/1025 for more details on the drawbacks of LTTB

- when doing nothing / idle, there is significant cpu being used, while canvas-based solutions will use zero cpu when the chart is not actively being updated (with new data or scale limits). i think this can probably be resolved in the WebGPU case with some additional code that pauses the updates.

- creating multiple charts on the same page with GL (e.g. dashboard) has historically been limited by the fact that Chrome is capped at 16 active GL contexts that can be acquired simultaneously. Plotly finally worked around this by using https://github.com/greggman/virtual-webgl

> data: [[0, 1], [1, 3], [2, 2]]

this data format, unfortunately, necessitates the allocation of millions of tiny arrays. i would suggest switching to a columnar data layout.

uPlot has a 2M datapoint demo here, if interested: https://leeoniya.github.io/uPlot/bench/uPlot-10M.html

27 comments

leeoniya

huntergemmer 18 days ago

Really appreciate you taking the time to look, Leon - uPlot has been a huge inspiration for proving that browser charts don't have to be slow.

Both points are fair:

1. LTTB peak elimination - you're right, and that PR is a great reference. For the 1M demo specifically, sampling is on by default to show the "it doesn't choke" story. Users can set sampling: 'none' for apples-to-apples comparison. I should probably add a toggle in the demo UI to make that clearer.

2. Idle CPU - good catch. Right now the render loop is probably ticking even when static. That's fixable - should be straightforward to only render on data change or interaction. Will look into it.

Would love your deeper dive feedback when you get to it. Always more to learn from someone who's thought about this problem as much as you have.

dapperdrake 18 days ago
Blind sampling like this makes it useless for real-world statistics of the kind your users care about.
And column-oriented data is a must. Look at Rlang's data frames, pandas, polars, numpy, sql, and even Fortran's matrix layout.
Also need specialized expicitly targetable support for Float32Array and Float64Array. Both API and ABI are necessary if you want to displace incumbents.
There is huge demand for a good web implementation. This is what it takes.
Am interested in collaborating.
- huntergemmer 18 days ago
  
  Down. Hit me up!
  https://www.linkedin.com/in/huntergemmer/

olau 18 days ago

Original Flot maintainer here.

I once had to deal with many million data points for an application. I ended up mip-mapping them client-side.

But regarding sampling, if it's a line chart, you can sample adaptively by checking whether the next point makes a meaningfully visible difference measured in pixels compared to its neighbours. When you tune it correctly, you can drop most points without the difference being noticeable.

I didn't find any else doing that at the time, and some people seemed to have trouble accepting it as a viable solution, but if you think about it, it doesn't actually make sense to plot say 1 million points in a line chart 1000 pixels wide. On average that would make 1000 points per pixel.

PaulDavisThe1st 18 days ago
We routinely face this in the audio world when drawing waveforms. You typically have on the order of 10-100k samples per second, durations of 10s-1000s of seconds, and pixel widths of on the order of 1-10k pixels.
Bresenham's is one algorithm historically used to downsample the data, but a lot of contemporary audio software doesn't use that. In Ardour (a cross-platform, libre, open source DAW), we actually compute and store min/max-per-N-samples and use that for plotting (and as the basis for further downsampling.
- leeoniya 18 days ago
  
  > In Ardour (a cross-platform, libre, open source DAW), we actually compute and store min/max-per-N-samples and use that for plotting (and as the basis for further downsampling.
  this is, effectively, what uPlot does, too: https://github.com/leeoniya/uPlot/issues/1119
ghc 18 days ago

> Original Flot maintainer here.
I discovered flot during my academic research career circa 2008 and it saved my ass more times than I can count. I just wanted to say thank you for that. I wouldn't be where I am today without your help :)
leeoniya 18 days ago

hey!
> But regarding sampling, if it's a line chart, you can sample adaptively by checking whether the next point makes a meaningfully visible difference measured in pixels compared to its neighbours.
uPlot basically does this (see sibling comment), so hopefully that's some validation for you :)
dapperdrake 18 days ago

This is a good sampling transform to offer. Call it "co-domain awareness" or something.

vlovich123 18 days ago

Is there any techniques using wavelet decomposition to decimate the high frequency component while retaining peaks? I feel like that's a more principled approach than sampling but I haven't seen any literature on it describing the specific techniques (unless the idea is fundamentally unsound which is not obvious to me).

huntergemmer 18 days ago
Interesting idea - I haven't explored wavelet-based approaches but the intuition makes sense: decompose into frequency bands, keep the low-frequency trend, and selectively preserve high-frequency peaks that exceed some threshold.
My concern would be computational cost for real-time/streaming use cases. LTTB is O(n) and pretty cache-friendly. Wavelet transforms are more expensive, though maybe a GPU compute shader could make it viable.
The other question is whether it's "visually correct" for charting specifically. LTTB optimizes for preserving the visual shape of the line at a given resolution. Wavelet decomposition optimizes for signal reconstruction - not quite the same goal.
That said, I'd be curious to experiment. Do you have any papers or implementations in mind? Would make for an interesting alternative sampling mode.
- vlovich123 18 days ago
  
  I don't. I just remember watching a presentation on it and it always struck me that wavelets are an incredibly powerful and underutilized technique for data reduction while preserving quality in a quantifiable and mathematically justifiable way.
  I don't have any papers in mind, but I do think that the critique around visual shape vs signal reconstruction may not be accurate given that wavelets are starting to see a lot of adoption in the visual space (at least JPEG2000 is the leading edge in that field). Might also be interesting to use DCT as well. I think these will perform better than LTTB (of course the compute cost is higher but there's also HW acceleration for some of these or will be over time).
  
  1 reply →
- dapperdrake 18 days ago
  
  Doesn't FFT depend at least on a "representative" sample of the entire dataset?
  Sounds like what makes sql joins NP-hard.
  
  1 reply →
dapperdrake 18 days ago

This really depends on your problem domain.

apitman 18 days ago

> creating multiple charts on the same page with GL (e.g. dashboard) has historically been limited by the fact that Chrome is capped at 16 active GL contexts that can be acquired simultaneously. Plotly finally worked around this by using https://github.com/greggman/virtual-webgl

Sometimes I like to ponder on the immense amount of engineering effort expended on working around browser limitations.

dapperdrake 18 days ago

Think of it as finally targeting a smartphone. People like beautiful pictures. And your phone is already in your hand.

aurbano 18 days ago

Not much to add, but as a very happy uPlot user here - just wanted to say thank you for such an amazing library!!

leeoniya 18 days ago

yw!

sarusso 18 days ago

What I did in a few projects to plot aggregated (resampled) data without loosing peaks was to plot it over an area chart representing the min-max values before aggregating (resampling). It worked pretty well.

Bengalilol 18 days ago

One small thing I noticed: when you zoom in or out (or change the time span), the y-axis stays the same instead of adapting to the visible data.

dapperdrake 18 days ago

Both are useful. With the y-axis staying the same there is a stable point of reference. Then you can see how sub-samples behave relative to your whole sample.

Cabal 18 days ago

I wouldn't spend too much of your time deep diving - it's an AI slop project.

fuckyah 18 days ago

[dead]

fuckyah 18 days ago

[dead]