Comment by nyrikki
1 year ago
Littlestone and Warmuth make the connection to compression in1986, which was later shown to be equivalent to VC dimensionally or PAC learnablilty.
Look into DBScan, OPTICs for far closer lenses on how clustering works in modern ML commercial ML, KNN not the only form of clustering.
But it is still in-context, additional compression that depends on a decider function, or equivalently a composition linearized set shattering parts.
I am very familiar with these and other clustering methods in modern ML, and have been involved in inventing and publishing some such methods myself in various scientific contexts. The paper I cited above only used 3 nearest neighbors as one baseline IIRC; that is why I mentioned KNN. However, even boosted trees failed to reduce the loss as much as the algorithm learned from the data by the decoder transformer.