Comment by ttoinou
1 day ago
for me, that could only be solved by changing architecture and/or introducing more insider tooling (like calling a program to make computation). It doesnt make any sense to fine tune a fuzzy input fuzzy output natural language processing algorithm to add and multiply all combinations of six digits numbers
This feels like a philosophical fault line in the industry.
For people whose purpose is to produce reliably working systems yeah, training a model that calls out to deterministic logic to do things like math makes total sense. It will pretty much always be more reliable than training a text generation model to produce correct arithmetic.
But it feels like there's another side of the industry that's more concerned with... I dunno, metaphysical aspects of these models? Where the idea that the model is a stochastic ball that isn't conscious, isn't thinking, and does poorly at various tasks is anathema. So the effort continues to try and train and fine-tune these models until... something.
It reminds me of the great Tesla-vs-everyone-else self-driving debates that raged over the past several years. Lots of people unhappy that the best-functioning systems fused many sensor types and a mixture of heuristic and machine-learned systems in a complex architecture. These folks insisted that the "best" architecture was an end-to-end machine-learned system based entirely on visible light cameras. Because it's "most human" or some other such nonsense. As far as I can tell there was never any merit to this position beyond some abstract notion of architectural purity.
Same thing here I suppose.