Comment by aesthesia
4 hours ago
I notice the experiments are all run with Gaussian token embeddings and weight matrices, which is a very different scenario than you would get in a real model. It shouldn't be much more difficult to try this with an actual model and data and get a much better sense of how well it compresses.
I completely agree.Right now this is all on a synthetic setup to isolate the behavior and understand the reconstruction vs memory tradeoff. Real models will definitely behave differently.
I’ve started trying this out with actual models, but currently running things CPU-bound, so it’s pretty slow. Would ideally want to try this properly on GPU, but that gets expensive quickly
So yeah, still very much a research prototype — but validating this on real models/data is definitely the next step.