Comment by ted_dunning

6 days ago

It isn't directly what you are asking for, but there is a similar relationship at work with respect to L_1 versus L_2 regularization. The number of samples required to train a model is O(log(d)) for L_1 and O(d) for L_2 where d is the dimensionality [1]. This relates to the standard random matrix results about how you can approximate high dimensional vectors in a log(d) space with (probably) small error.

At a very handwaving level, it seems reasonable that moving from L_1 to L_0 would have a similar relationship in learning complexity, but I don't think that has every been addressed formally.

[1] https://www.andrewng.org/publications/feature-selection-l1-v...