Comment by axegon_
15 hours ago
FFS... "Lots of writers, few readers". Read again and do the math: 2 seconds, multiply that by 10 million records which contain this, as well as "alarm installation in two locations" and a whole bunch of other crap with little to no repetition (<2%) and where does that get you? 2 * 10,000,000 = 20,000,000 SECONDS!!!! A day has 86,400 seconds (24 * 3600 = 86,400). The data pipeline needs to finish in <24 hours. Everyone needs to get this into their heads somehow: LLM's are not a silver bullet. They will not cure cancer anytime soon, nor will they be effective or cheap enough to run at massive scale. And I don't mean cheap as in "oh, just get openai subscription hurr durr". Throwing money mindlessly into something is never an effective way to solve a problem.
Assuming the 10M records is ~2000M input tokens + 200M output tokens, this would cost $300 to classify using llama-3.3-70b[1]. If using llama lets you do this in say one day instead of two days for a traditional NLP pipeline, it's worthwhile.
[1]: https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
> ...two days for a traditional NLP pipeline
Why 2 days? Machine Learning took over the NLP space 10-15 years ago, so the comparison is between small, performant task-specific models versus LLMs. There is no reason to believe the "traditional" NLP pipelines are inherently slower than Large Language Models, and they aren't.
Why are you using 2 seconds? The commenter you are responding to hypothesized being able to do 250/s based on "100 parallel inference at 5 at a time". Not speaking to the validity of that, but find it strange that you ran with the 2 seconds number after seemingly having stopped reading after that line, while yourself lamenting people don't read and telling them to "read again".
Ok, let me dumb it down for you: you have a cockroach in your bathroom and you want to kill it. You have an RPG and you have a slipper. Are you gonna use the RPG or are you going to use the slipper? Even if your bathroom is fine after getting shot with an RPG somehow, isn't this an overkill? If you can code and binary classifier train a classifier in 2 hours that uses nearly 0 resources and gives you good enough results(in my case way above what my targets were) without having to use a ton of resources, libraries, rags, hardware and hell, even electricity? I mean how hard is this to comprehend really?
https://deviq.com/antipatterns/shiny-toy
Sure, but this doesn't answer my question nor tie into your last comment at all. It's Saturday evening in much of the world, are you sober?
OP said 2 seconds as if that wasn't an eternity...
But then they said 250/second when running multiple inference? Again I don't know if their assertions about running multiple inference are correct but why focus on the wrong number instead of addressing the actual claim?