Comment by singron

3 months ago

We definitely learned the exact same lesson. Especially if your LLM responses need to be fast and cheap, then you need short prompts and small non-reasoning models. A lot of information out there assumes you are willing to wait 30 seconds for huge models to burn cash, but if you are building an interactive product at a reasonable price-point, you are going to use less capable models.

I think the unfortunate next conclusion is that this isn't a great primary UI for a lot of applications. Users don't like typing full sentences and guessing the capabilities of a product when they can just click a button instead, and the LLM no longer has an opportunity to add value besides translating. You are probably better served by a traditional UI that constructs the underlying request, and then optionally you can also add on an LLM input that can construct requests or fill in the UI.

2 comments

singron

wruza 3 months ago

Especially if your LLM responses need to be fast and cheap, then you need short prompts

IME, to get short answers you have to system prompt an llm to shut up and slap focus in a couple paragraphs no less. (Agreed with the rest)

petesergeant 3 months ago

I’d agree with all of this, although I’d also point out o3-mini is very fast and cheap.