Comment by 2001zhaozhao

2 days ago

Cerebras is serving GLM4.6 at 1000 tokens/s right now. They're probably likely to upgrade to this model.

I really wonder if GLM 4.7 or models a few generations from now will be able to function effectively in simulated software dev org environments, especially that they self-correct their errors well enough that they build up useful code over time in such a simulated org as opposed to increasing piles of technical debt. Possibly they are managed by "bosses" which are agents running on the latest frontier models like Opus 4.5 or Gemini 3. I'm thinking in the direction of this article: https://www.anthropic.com/engineering/effective-harnesses-fo...

If the open source models get good enough, then the ability to run them at 1k tokens per second on Cerebras would be a massive benefit compared to any other models in being able to run such an overall SWE org quickly.

It is awesome! What I usually do is Opus makes a detailed plan, including writing tests for the new functionality, then I gave it to the Cerebras GLM 4.6 to implement it. If unsure give it to Opus for review.

This is where I believe we are headed as well. Frontier models "curate" and provide guardrails, very fast and competent agents do the work at incredibly high throughput. Once frontier hits cracks the "taste" barrier and context is wide enough, even this level of delivery + intelligence will be sufficient to implement the work.

  • Taste is why I switched from GLM-4.6 to Sonnet. I found myself asking Sonnet to make the code more elegant constantly and then after the 4th time of doing that laughed at the absurdity and just switched models.

    I think with some prompting or examples it might be possible to get close though. At any rate 1k TPS is hard to beat!

How cheap is glm at Cerebras? I cant imagine why they cant tune the tokens to be lower but drastically reduce the power, and thus the cost for the API

  • They're running on custom ASICs as far as I understand, it may not be possible to run them effectively at lower clock speeds. That and/or the market for it doesn't exist in the volume required to be profitable. OpenAI has been aggressively slashing its token costs, not to mention all the free inference offerings you can take advantage of

How easy is it to become their (Cerebras) paying customer? Last time I looked, they seemed to be in closed beta or something.

  • I signed up and got access within a few days. They even gave me free credits for a while

    • That's gone now. They do drops from time to time, but their compute platform is saturated.

A lot of people are swear by Cerebras, it seems to really speed up their work. I would love to experience that but at the moment I have overabundance of AI at my disposal, signing up for another service would be too much :)

But yeah it seems that Cerebras is a secret of success for many