Comment by dmazzoni

1 day ago

I remember at Google at around 2007 - 2009, as Google was massively expanding its data centers, there was a lot of unused capacity, especially during off-hours. Any engineer could run as many jobs as they wanted at zero priority, which means the job would be first in line to be killed if a more important task needed the resource.

I did so many interesting experiments with MapReduces that would run overnight.

For a while, I would even build internal services that were basically "free" because I'd just run them all at priority 0.

Over time those services got less and less reliable as overall usage started to increase, so I was forced to either justify the resources or scale back - but that was a good thing.

I feel like something similar would be a good model for AI token use: big tech companies ought to have their own self-hosted LLM data centers to power their own needs, then let employees use off-hours capacity to experiment.

Outside of experimentation, we should be encouraging token efficiency for everyday tasks. Rather than having a certain number of tokens, engineers should be evaluated based on how much they actually get done.

Using a lot of tokens to automate a process that used to require hours of human labor every week? Good use of tokens, should be encouraged.

Using a lot of tokens to debug an easy frontend bug that could have been fixed by hand, and still took you 4 hours to complete? Waste of tokens, should be discouraged.

Don’t think we’ll see similarly logically behaviour from LLM users tbh. A sizeable portion of the user base seems to insist on through opus at every trivial task

Most AI front ends seem to be designed for interactive jobs, so they make it hard to define a job that should be done eventually with zero priority. It makes much more sense to do that with spec-driven development (have work done with the human on the loop rather in the loop), but as far as I know that just isn’t well supported by any front end yet (would be happy to be proven wrong, my experience is with Google front ends).

  • There’s a lot of places where an llm can improve a data pipeline. Like if Claude sonnet was free I’d do a lot of data enrichment.

"Using a lot of tokens to debug an easy frontend bug that could have been fixed by hand, and still took you 4 hours to complete? Waste of tokens, should be discouraged."

Hahahah good luck with that!

For many of us, what is happening now was super obvious. Telling a new formed crack addict (who you wanted to become addicted) to be more thoughtful about their consumption of crack... yeah not gonna work is it.

You’re right of course, but wouldn’t it be more likely that everyone will embrace 10x cheaper Chinese models?

  • my money is on: eventually frontier model dev and training becomes basic research funded by governments, and LLM operators become essentially private utilities a la ISPs, competing mostly on data center operational costs and occasionally new chip tech to run models cheaper

    and governments will keep running massive data centers with classified frontier models for intelligence and propaganda purposes