Comment by flask_manager

17 days ago

The distinction that i think is important to make when talking about "the bitter lesson" is that improving the compute and training infrastructure and tricks in the abstract wins over intelligent model and system design.

Its more about the information about the specific problem you are solving having less impact than techniques that target the compute. So in this case, breaking down how to parse a PDF in stages for your domain is involving specific expert knowledge of the domain, but training with attention is about efficient use of compute in general; with no domain expertise.