Comment by meroes
1 day ago
> arithmetic (why would use LLM for that ?)
Because people ask LLMs all of these things, including arithmetic. People were saying the same about the number of r's in strawberry. Why ask and LLM that!?!? But the big AI companies want LLMs to be better at these questions, probably because people ask them to LLMs. The big AI companies want this because there is no other explanation for the money poured into RLHF'ing these types of problems.
for me, that could only be solved by changing architecture and/or introducing more insider tooling (like calling a program to make computation). It doesnt make any sense to fine tune a fuzzy input fuzzy output natural language processing algorithm to add and multiply all combinations of six digits numbers
This feels like a philosophical fault line in the industry.
For people whose purpose is to produce reliably working systems yeah, training a model that calls out to deterministic logic to do things like math makes total sense. It will pretty much always be more reliable than training a text generation model to produce correct arithmetic.
But it feels like there's another side of the industry that's more concerned with... I dunno, metaphysical aspects of these models? Where the idea that the model is a stochastic ball that isn't conscious, isn't thinking, and does poorly at various tasks is anathema. So the effort continues to try and train and fine-tune these models until... something.
It reminds me of the great Tesla-vs-everyone-else self-driving debates that raged over the past several years. Lots of people unhappy that the best-functioning systems fused many sensor types and a mixture of heuristic and machine-learned systems in a complex architecture. These folks insisted that the "best" architecture was an end-to-end machine-learned system based entirely on visible light cameras. Because it's "most human" or some other such nonsense. As far as I can tell there was never any merit to this position beyond some abstract notion of architectural purity.
Same thing here I suppose.