Comment by jjcm

2 days ago

Speaking as someone who's bootstrapping here, I'm often envious of engineers at these larger companies, but I also worry that the incentives are screwed up.

If I were an engineer at Uber, why wouldn't I select gpt 5.5 pro @ very high thinking + fast mode for a prompt? There's no incentive not to use the most powerful (and thus most expensive) model for even the smallest of changes.

I tried one of these prompts for some tests I'm doing for image->html conversion, and a single prompt cost me $40. For someone that's paying that themselves, I'd pretty much never use this configuration. For someone at a large company where someone else is footing the bill, I'd spin these up regularly (the output was significantly better, fwiw). For engineers they're being rated on what they deliver, not the expenditure to get there.

There are ways to do this cheaply, but there are no incentives for engineers to do so.

SWE's are expensive; median salary is $133k (not counting health insurance, payroll taxes, etc). If you can shave off an hour of dev time with $40 in LLM credits, that's $26.50 cheaper than having them do it without.

I'm not entirely convinced it works out that way so far, but that's the theory.

Trying to bring down LLM costs is sort of a double-edged sword, because the dev needs to be cutting LLM costs by more than what you're paying them. If it takes them a day to bring costs down by $1 an invocation, then it takes almost 2 years to recoup the salary costs. It's worse because LLMs currently change so much I wouldn't be confident that their solution won't be broken before the 2 year period. Will we still be tool calling in 2 years, or will that be something new? Will thinking still be a thing, or will it be superceded by something else? I don't think anyone knows, even the frontier providers.

  • > If you can shave off an hour of dev time with $40 in LLM credits, that's $26.50 cheaper than having them do it without.

    This assumes that that hour shaved was used elsewhere productively which is not the case.

    • Yeah, that's part of why I said I'm not entirely convinced. 1 hours vs 2 hours is an unrealistic example. I do still think it can make sense, but the extra actual productivity is probably more in the vein of "getting an extra hour on a 10 or 20 hour project".

We ended up using a service like yakpdf, for HTML to PDF generation.

It handled most of the rendering issues out of the box compared to headless browser setups.

Companies may first want to see how fast you can scale work and then trim it back down for efficiency.

  • How could they implement it? Try testing a bunch of models (closed and open sourced) and then seeing which one gives the best returns for it's cost? And then how do they check if it's being properly used, I have read of people just throwing their token budgets to the fire so that they show high usage for KPIs, while the most obvious cases of "X do this very wasteful thing" will be culled quickly (hopefully), I don't see how non-technical management can see through the thinnest layer of malicious compliance

image->html is a pretty involved task though. That’s basically a frontend dev’s job. $40 wouldn’t cover an hour of their time.