Comment by onchainintel

10 hours ago

How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6

37 comments

onchainintel

greenknight 10 hours ago

The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.

This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.

libraryofbabel 8 hours ago
> you can download it, run it on your systems
In theory, sure, but as other have pointed out you need to spend half a million on GPUs just to get enough VRAM to fit a single instance of the model. And you’d better make sure your use case makes full 24/7 use of all that rapidly-depreciating hardware you just spent all your money on, otherwise your actual cost per token will be much higher than you think.
In practice you will get better value from just buying tokens from a third party whose business is hosting open weight models as efficiently as possible and who make full use of their hardware. Even with the small margin they charge on top you will still come out ahead.
- oceanplexian 8 hours ago
  
  There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.
  And that GPU wouldn’t run one instance, the models are highly parallelizable. It would likely support 10-15 users at once, if a company oversubscribed 10:1 that GPU supports ~100 seats. Amortized over a couple years the costs are competitive.
  
  2 replies →
- hsbauauvhabzb 8 hours ago
  
  Sure, but that’s an incredibly short term viewpoint.
p1esk 10 hours ago
Do you think a lot of people have “systems” to run a 1.6T model?
- CJefferson 9 hours ago
  
  To me, the important thing isn't that I can run it, it's that I can pay someone else to run it. I'm finding Opus 4.7 seems to be weirdly broken compared to 4.6, it just doesn't understand my code, breaks it whenever I ask it to do anything.
  Now, at the moment, i can still use 4.6 but eventually Anthropic are going to remove it, and when it's gone it will be gone forever. I'm planning on trying Deepseek v4, because even if it's not quite as good, I know that it will be available forever, I'll always be able to find someone to run it.
  
  1 reply →
- applfanboysbgon 9 hours ago
  
  No, but businesses do. Being able to run quality LLMs without your business, or business's private information, being held at the mercy of another corp has a lot of value.
  
  10 replies →
onchainintel 10 hours ago

Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.
kelseyfrog 10 hours ago
What's the hardware cost to running it?
- redox99 10 hours ago
  
  Probably like 100 USD/hour
- bbor 9 hours ago
  
  I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take:
  - To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...).
  - To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.
  - To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).
  Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...
  
  2 replies →
- slashdave 10 hours ago
  
  "if you have to ask..."
johnmaguire 10 hours ago
... if you have 800 GB of VRAM free.
- inventor7777 10 hours ago
  
  I remember reading about some new frameworks have been coming out to allow Macs to stream weights of huge models live from fast SSDs and produce quality output, albeit slowly. Apart from that...good luck finding that much available VRAM haha

spaceman_2020 7 hours ago

Tbh I was more productive with 4.6 than ever before and if AI progress locks in permanently at 4.6 tier, I’d be pretty happy

rvz 10 hours ago

It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.

It's about 2 months behind GPT 5.5 and Opus 4.7.

As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.

It should be obvious now why Anthropic really doesn't want you to run local models on your machine.

deaux 9 hours ago

Vibes > Benchmarks. And it's all so task-specific. Gemini 3 has scored very well in benchmarks for very long but is poor at agentic usecases. A lot of people prefering Opus 4.6 to 4.7 for coding despite benchmarks, much more than I've seen before (4.5->4.6, 4->4.5).
Doesn't mean Deepseek v4 isn't great, just benchmarks alone aren't enough to tell.
snovv_crash 9 hours ago

With the ability of the Qwen3.6 27B, I think in 2 years consumers will be running models of this capability on current hardware.
colordrops 9 hours ago
What's going to change in 2 years that would allow users to run 500B-800B parameter models on consumer hardware?
- DiscourseFan 9 hours ago
  
  I think its just an estimate
  
  1 reply →