← Back to context

Comment by nospice

10 hours ago

I'm not sure I like this method of accounting for it. The critics of LLMs tend to conflate the costs of training LLMs with the cost of generation. But this makes the opposite error: it pretends that training isn't happening as a consequence of consumer demand. There are enormous resources poured into it on an ongoing basis, so it feels like it needs to be amortized on top of the per-token generation costs.

At some point, we might end up in a steady state where the models are as good as they can be and the training arms race is over, but we're not there yet.

That's not really an error, that's a fundamental feature of unit economics.

Fixed costs can't be rolled into the unit economics because the divisor is continually growing. The marginal costs of each incremental token/query don't depend on the training cost.

It would be really hard to properly account for the training, since that won't scale with more generation.

The training is already done when you make a generative query. No matter how many consumers there are, the cost for training is fixed.

  • My point is that it isn't, not really. Usage begets more training, and this will likely continue for many years. So it's not a vanishing fixed cost, but pretty much just an ongoing expenditure associated with LLMs.

    • No one doing this for money intends to train models that will never be amortized. Some will fail and some are niche, but the big ones must eventually pay for themselves or none of this works.

      The economy will destroy inefficient actors in due course. The environmental and economic incentives are not entirely misaligned here.

      1 reply →

The challenge with no longer developing new models is making sure your model is up to date which as of today requires an entire training run. Maybe they can do that less or they’ll come up with a way to update a model after it’s trained. Maybe we’ll move onto something other than LLMs

The training cost is a sunk cost for the current LLM, and unknown for the next-generation LLM. Seems like it would be useful information but doesn't go here?

The AI training data sets are also expensive... The cost is especially hard to estimate for data sets that are internal to businesses like Google. Especially if the model needs to be refreshed to deal with recent data.

I presume historical internal datasets remain high value, since they might be cleaner (no slop) or maybe unavailable (copyright takedowns) and companies are getting better at hiding their data from spidering.