Comment by math_dandy

6 days ago

Could you give more details about what precisely you mean by interpolation and generalization? The commonplace use of “generalization” in the machine learning textbooks I’ve been studying is model performance (whatever metric is deemed relevant) on new data from the training distribution. In particular, it’s meaningful when you’re modeling p(y|x) and not the generative distribution p(x,y).

1 comment

math_dandy

mjburgess 6 days ago

It's important to be aware that ML textbooks are conditionalising every term on ML being the domain of study, and along with all computer science, extremely unconcerned with words they borrow retaining their meaning.

Generalisation in the popular sense (science, stats, philosophy of science, popsci) is about reliability and validity, def. validity = does a model track the target properties of a system we expect; reliability = does it continue to do so in environments in which those features are present, but irrelevant permutations are made.

Interpolation is "curve fitting", which is almost all of ML/AI. The goal of curve fitting is to replace a general model with a summary of the measurement data. This is useful when you have no way of obtaining a model of the data generating process.

What people in ML assume is that there is some true distribution of measurements, and "generalisation" means interpolating the data so that you capture the measurement distribution.

I think it's highly likely there's a profound conceptual mistake in assuming measurements themsleves have a true distribution, so even the sense of generalisation to mean "have we interpolated correctly" is, in most cases, meaningless.

Part of the problem is that ML textbooks frame all ML problems with the same set of assumptions (eg., that there exists an f: X->Y, that X has a "true distribution" Dx, so that finding f* implies learning Dx). For many datasets, these assumptions are false. Compare running a linear regression on photos of the sky, through stars to get star signs, vs. running it on V=IR electric circuit data to get `R`

In the former cases, there is no f_star_sign to find; there is no "true distribution" of star sign measurements; etc. So any model of star signs cannot be a model even of measurements of star signs. ML textbooks do not treat "data" as having these kinds of constraints, or relationships to reality, which breads pseudoscientific and credulous misunderstandings of issues (such as, indeed, the othello paper).