Comment by vlovich123

6 months ago

That’s the argument the article makes but the reasoning is a little questionable on a few fronts:

- It uses f16 for the data format whereas quantization can reduce the memory burden without a meaningful drop in accuracy, especially as compared with traditional NLP techniques.

- The quality of LLMs typically outperform OpenCV + NER.

- You can choose to replace just part of the pipeline instead of using the LLM for everything (e.g. using text-only 3B or 1B models to replace the NER model while keeping OpenCV)

- The (LLM compute / quality) / watt is constantly decreasing. Meaning even if it’s too expensive today, the system you’ve spent time building, tuning and maintaining today is quickly becoming obsolete.

- Talking with new grads in NLP programs, all the focus is basically on LLMs.

- The capability + quality out of models / size of model keeps increasing. That means your existing RAM & performance budget keeps absorbing problems that seemed previously out of reach

Now of course traditional techniques are valuable because they can be an important tool in bringing down costs (fixed function accelerator vs general purpose compute), but it’s going to become more niche and specialized with most tasks transitioning to LLMs I think.

The “bitter lesson” paper is really relevant to these kinds of discussions.

1 comment

vlovich123

vlovich123 6 months ago

Not an independent player so obviously important to be critical of papers like this [1], but it’s claiming a ~10x cost in LLM inference every year. This lines up with the technical papers I’m seeing that are continually improving performance + the related HW improvements.

That’s obviously not sustainable indefinitely, but these kinds of exponentials are precisely why people often make incorrect conclusions on how long change will take to happen. Just a reminder: CPUs were 2x more performance every 18 months and continued to continually upend software companies for 20 years who weren’t in tune with this cycle (i.e. focusing on performance instead of features). For example, even if you’re spending $10k/month for LLM vs $100/month to process the 10M item, it can still be more beneficial to go the LLM route as you can buy cheaper expertise to put together your LLM pipeline than the NLP route to make up the ~100k/year difference (assuming the performance otherwise works and the improved quality and robustness of the LLM solution isn’t providing extra revenue to offset).

[1] https://a16z.com/llmflation-llm-inference-cost/