Comment by sandworm101

5 hours ago

Running the model isnt the cost. Watts per token is the math they show investors. You also have to be constantly training new models, which currently needs more compute than servicing the customer base. You have to biuld datacenters, and possibly powerplants to feed them. You have to carry debts. And you will need to buy new GPUs/ram every few years to remain competative. The total business is vastly different than simple gpu math.

1 comment

sandworm101

paulddraper 3 hours ago

You are in violent agreement.

> inference is indeed profitable