Comment by xrd
3 months ago
Fascinating stuff.
For some reason, this reminds me the way video encoders compress video:
https://en.wikipedia.org/wiki/Video_compression_picture_type...
It makes me wonder if you could use a similar technique (iframes, bframes or pframes) to get the diff of a "normal" WSI and then train on pattern recognition of those.
These different frames are used to reduce network transmission costs, but it feels similar to the context window if you squint at it as a throughput problem rather than a context window size problem.
It feels like there would be a lot of tools and codecs you could leverage here.
I've been thinking a bit more about better ways to build the tooling around it, I don't know much about video compression to be fully transparent but will read up on it.
I have been running into some problems with memory management here as each later frame needs to have a degree of context of the previous frames... (currently I just do something simple like pass in the previous frame and the first reference frame into context) maybe I can look into video compression and see if there is any inspiration there