← Back to context

Comment by mjburgess

3 years ago

NNs are decision trees anyway -- take any classification alg and rewind from its decision points into a disjunction of conditions.

Or, maybe more clearly: imagine taking any classification algorithm and drawing the graph of all of its predictions across it's domain. Then just construct a decision tree which "draws splits" along the original alg's decision edges.

Likewise, all ML is equivalent to a KNN parameterised on an averaging operation.

Everything here is eqv to everything else. ML is just computing an expectation over a training dataset, weighted by the model parameters.

The "value" comes from the (copyright laundering/) data. The only question is: can you find useful weights by which to control the expectation you're taking?

Various ML approaches weight the training data differently. The most successful of the latest round of AI manages to compute weights across everything ever written --- hence more useful than naive KNN which wouldnt terminate on 1PB of text.

> NNs are decision trees anyway -- take any classification alg and rewind from its decision points into a disjunction of conditions

By that argument, every computation can be reduced to a lookup table. Take every possible input, memorize the correct output and store it in a database of sorts.

If decision trees were truly equivalent to NNs, you would be able to solve any problem currently addressed with NNs but using only decision trees without learning from the output of the NN. Same input datasets same output quality metrics.

Not really feasible, is it?

Likewise with all the other equivalences you made here.

  • Sure, every computation is equivalent to a lookup table over "predetermined answers". It isn't equivalent if we don't have those answers.

    Eg., "what's the US President's telephone number in 2000?" had no answer in 1900.

    > If decision trees were truly equivalent to NNs

    They are equivalent. And you don't need to precompute answers you don't have. You can take the weights of a NN and encode them as a DT; just as you can also transform a NN to just be k-nearest-neighbors.

    The reason we dont do that is prediction efficiency.

    Also, of course, such functions are also basically impossible to train as a practical matter. That bares little on their equivalence.

    All ML models are expressible as k-nearest-neighbors -- this is useful information because it demystifies the process. Countless papers end with "and we dont know why!" -- where the "why" is obvious if you reformulate the model.

    ML is just ranking a historical dataset of size N, by similarity to some X, selecting up to N examples from it, weighting each by W and then taking an average.

  • > By that argument, every computation can be reduced to a lookup table. Take every possible input, memorize the correct output and store it in a database of sorts

    You're playing into his argument. You are right. All computation we know of is equivalent to a lookup table since none of our computers are actual turing machines.

    And this highlights the difference between the software engineer way of thinking and the mathematical one