← Back to context

Comment by osigurdson

15 days ago

It does seem that companies are able to get reliability in narrow problem domains via prompts, evals and fine tuning.

> It does seem

And therein lies all the problem. The verification required for serious work is likely orders of magnitude more than anybody is willing to spend on.

For example, professional OCR companies have large teams of reviewers who double or triple review everything, and that is after the software itself flags recognition with varying degrees of certainty. I don't think companies are thinking of LLMs as tools that require that level of dedication and resources, in virtually all larger scale use cases.

  • This seems to be exactly the business model of myriad recent YC startups. It seemingly did work for casetext as an example.

In some cases this is true, but then why choose an expensive world model over a small net or random forest you trained specifically for the task at hand?