Comment by alexwebb2

6 months ago

I think your intuition on this might be lagging a fair bit behind the current state of LLMs.

System message: answer with just "service" or "product"

User message (variable): 20 bottles of ferric chloride

Response: product

Model: OpenAI GPT-4o-mini

$0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25

$0.300/1Mt batch output * 1 output token * 10M jobs = $3.00

It's a sub-$25 job.

You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

30 comments

alexwebb2

simonw 6 months ago

You might be able to use an even cheaper model. Google Gemini 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.

17 input tokens and 2 output tokens * 10 million jobs = 170,000,000 input tokens, 20,000,000 output tokens... which costs a total of $6.38 https://tools.simonwillison.net/llm-prices

As for rate limits, https://ai.google.dev/pricing#1_5flash-8B says 4,000 requests per minute and 4 million tokens per minute - so you could run those 10 million jobs in about 2500 minutes or 42 hours. I imagine you could pull a trick like sending 10 items in a single prompt to help speed that up, but you'd have to test carefully to check the accuracy effects of doing that.

w10-1 6 months ago

The question is not average cost but marginal cost of quality - same as voice recognition, which had relatively low uptake even at ~2-4% error rates due to context switching costs for error correction.

So you'd have to account for the work of catching the residue of 2-8%+ error from LLMs. I believe the premise is for NLP, that's just incremental work, but for LLM's that could be impossible to correct (i.e., cost per next-percentage-correction explodes), for lack of easily controllable (or even understandable) models.

But it's most rational in business to focus on the easy majority with lower costs, and ignore hard parts that don't lead to dramatically larger TAM.

gf000 6 months ago

I am absolutely not an expert in NLP, but I wouldn't be surprised if for many kinds of problems LLMs would have far less error rate, than any NLP software.
Like, lemmation is pretty damn dumb in NLP, while a better LLM model will be orders of magnitude more correct.

griomnib 6 months ago

This assumes you don’t care about our rapidly depleting carbon budget.

No matter how much energy you save personally, running your jobs on Sam A’s earth killer ten thousand cluster of GPUs is literally against your own self interest of delaying climate disasters.

LLM have huge negative externalities, there is a moral argument to only use them when other tools won’t work.

amanaplanacanal 6 months ago
It's digging fossil carbon out of the ground that's the problem, not using electricity. Switch to electricity not from fossil carbon and you're golden.
- griomnib 5 months ago
  
  Drowning isn’t the problem; just the water.
renewiltord 6 months ago

Haha, this is pretty good. I’m going to take a plane to SF while I laugh at this.

elicksaur 6 months ago

How do you validate these classifications?

bugglebeetle 6 months ago

The same way you check performance for any problem like this: by creating one or more manually-labeled test datasets, randomly sampled from the target data and looking at the resulting precision, recall, f-scores etc. LLMs change pretty much nothing about evaluation for most NLP tasks.
segmondy 6 months ago

The same way you validate it if you didn't use an LLM.
jeswin 6 months ago

Isn't it easier and cheaper to validate than to classify (requires expensive engineers)? I mean the skill is not as expensive - many companies do this at scale.
scarface_74 6 months ago

You need a domain expert either way. I mentioned in another reply that one of my niches is implementing call centers with Amazon Connect and Amazon Lex (the NLP engine).
https://news.ycombinator.com/item?id=42748189
I don’t know the domain beforehand they are working in, I do validation testing with them.

axegon_ 6 months ago

Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.

FloorEgg 6 months ago
Run them all in parallel with a cloud function in less than a minute?
- hnfong 6 months ago
  
  Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.
  
  2 replies →
- axegon_ 6 months ago
  
  Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....
  
  2 replies →
- rlt 6 months ago
  
  Also can’t you just combine multiple classification requests into a single prompt?
  
  1 reply →

LeafItAlone 6 months ago

>You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

How much for the “prompt engineer”? Who is going to be doing the work and validating the output?

blindriver 6 months ago

You do not need a prompt engineer to create: “answer with just "service" or "product"”
Most classification prompts can be extremely easy and intuitive. The idea you have to hire a completely different prompt engineer is kind of funny. In fact you might be able to get the llm itself to help revise the prompt.
alexwebb2 6 months ago
All software engineers are (or can be) prompt engineers, at least to the level of trivial jobs like this. It's just an API call and a one-liner instruction. Odds are very good at most companies that they have someone on staff who can knock this out in short order. No specialized hiring required.
- otabdeveloper4 6 months ago
  
  > ..and validating the output?
  You glossed over the meat of the question.
  
  1 reply →
IanCal 6 months ago
Prompt engineering is less and less of an issue the simpler the job is and the more powerful the model is. You also don't need someone with deep nlp knowledge to measure and understand the output.
- LeafItAlone 6 months ago
  
  >less and less of an issue the simpler the job
  Correct, everything is easy and simple if you make it simple and easy…
  
  1 reply →