Comment by edge17
2 days ago
Is there something I can read to get a better sense of what types of models are most suitable for which problems? All I hear about are transformers nowadays, but what are the types of problems for which transformers are the right architecture choice?
Just do some basic searches on e.g. Google Scholar for your task (e.g. "medical image segmentation", "point cloud segmentation", "graph neural networks", "timeseries classification", "forecasting") or task modification (e.g. "'rotation invariant' architecture") or whatever, sort by year, make sure to click on papers that have a large number of citations, and start reading. You will start to get a feel for domains or specific areas where transformers are and are not clearly the best models. Or just ask e.g. ChatGPT Thinking with search enabled about these kinds of things (and then verify the answer by going to the actual papers).
Also check HuggingFace and other model hubs and filter by task to see if any of these models are available in an easy-to-use format. But most research models will only be available on GitHub somewhere, and in general you are just deciding between a vision transformer and the latest convolutional model (usually a ConvNext vX for some X).
In practice, if you need to work with the kind of data that is found online, and don't have a highly specialized type of data or problem, then you do, today, almost always just want some pre-trained transformer.
But if you actually have to (pre)train a model from scratch on specialized data, in many cases you will not have enough data or resources to get the most out of a transformer, and often some kind of older / simpler convolutional model is going to give better performance at less cost. Sometimes in these cases you don't even want a deep-learner at all, and just classic ML or algorithms are far superior. A good example would be timeseries forecasting, where embarrassingly simple linear models blow overly-complicated and hugely expensive transformer models right out of the water (https://arxiv.org/abs/2205.13504).
Oh, right, and unless TabPFNv2 (https://www.nature.com/articles/s41586-024-08328-6) makes sense for your use-case, you are still better off using boosted decision trees (e.g. XGBoost, LightGBM, or CatBoost) for tabular data.