Comment by alexc05

2 days ago

If I could have one of these cards in my own computer do you think it would be possible to replace claude code?

1. Assume It's running a better model, even a dedicated coding model. High scoring but obviously not opus 4.5 2. Instead of the standard send-receive paradigm we set up a pipeline of agents, each of whom parses the output of the previous.

At 17k/tps running locally, you could effectively spin up tasks like "you are an agent who adds semicolons to the end of the line in javascript", with some sort of dedicated software in the style of claude code you could load an array of 20 agents each with a role to play in improving outpus.

take user input and gather context from codebase -> rewrite what you think the human asked you in the form of an LLM-optimized instructional prompt -> examine the prompt for uncertainties and gaps in your understanding or ability to execute -> <assume more steps as relevant> -> execute the work

Could you effectively set up something that is configurable to the individual developer - a folder of system prompts that every request loops through?

Do you really need the best model if you can pass your responses through a medium tier model that engages in rapid self improvement 30 times in a row before your claude server has returned its first shot response?

5 comments

alexc05

AmazingTurtle 2 days ago

Models can't improve themselves with their own (model) input, they need to be grounded in truth and reality.

dalenw 2 days ago

I think so. The last few months have shown us that it isn't necessarily the models themselves that provide good results, but the tooling / harness around it. Codex, Opus, GLM 5, Kimi 2.5, etc. all each have their quirks. Use a harness like opencode and give the model the right amount of context, they'll all perform well and you'll get a correct answer every time.

So in my opinion, in a scenario like this where the token output is near instant but you're running a lower tier model, good tooling can overcome the differences between a frontier cloud model.

rustyhancock 2 days ago

It's 2.5kW so it likely won't sit in your computer (quite beyond what a desktop could provide in power alone to a single card, let alone cool). It's 8.5cm^2 which is a beast of a single die.

Basically logistically it's going to need to be in a data centre.

It's ideal for small context high throughput. Perhaps parsing huge text piles like if you had the entire Epstein files as text.

I think Claude code benefits from larger context to keep your entire project in view and deep reasoning.

What this would certainly replace is when Claude dispatched to Haiku for manual NLP tasks.

runeks 1 day ago
> It's 2.5kW so it likely won't sit in your computer (quite beyond what a desktop could provide in power alone to a single card, let alone cool). It's 8.5cm^2 which is a beast of a single die.
I wonder how you cool a 3x3cm die that outputs 2.5 kW of heat. In the article they mention that the traditional setup requires water cooling, but surely this does as well, right?
- rustyhancock 17 hours ago
  
  Can't imagine what else could manage that nearly 2.8W/mm2.
  It does make you wonder if they copy is misleading about something so simple how much else could be puffery?
  Maybe they mean that a standard liquid cooling system will work?