Comment by hodgehog11
5 hours ago
Gradient boosting handles tabular data better than neural networks, often because the structure is simpler, and it becomes more of an issue to deal with the noise. You can do like-to-like comparisons between them for unstructured data like images, audio, video, text, and a well-designed NN will mop the floor with gradient boosting. This is because to handle that sort of data, you need to encode some form of bias around expected convolutional patterns in the data, or you won't get anywhere. Both CNNs and transformers do this.
Would you agree/disagree with the following:
- It's not gradient boosting per se that's good on tabular data, it's trees. Other fitting methods with trees as the model are also usually superior to NNs on tabular data.
- Trees are better on tabular data because they encode a useful inductive bias that NNs currently do not. Just like CNNs or ViTs are better on images because they encode spatial locality as an inductive bias.
Absolutely agree on both counts. Gradient boosting is the most commonly known and most successful variant, but it's the decision tree structure that is the underlying architecture there. Decision trees don't have the same "implicit training bias" phenomenon that neural networks have though, so all of this is just model bias in the classical statistical sense.