← Back to context

Comment by fnordpiglet

5 hours ago

For coding often quality at the margin is crucial even at a premium. It’s not the same as cranking out spam emails or HN posts at scale. This is why the marginal difference between your median engineer and your P99 engineer is comp is substantial, while the marginal comp difference between your median pick and packer vs your P99 pick and packer isn’t.

I’d also say it keeps the frontier shops competitive while costing R&D in the present is beneficial to them in forcing them to make a better and better product especially in value add space.

Finally, particularly for Anthropic, they are going for the more trustworthy shop. Even ali is hosting pay frontier models for service revenue, but if you’re not a Chinese shop, would you really host your production code development workload on a Chinese hosted provider? OpenAI is sketchy enough but even there I have a marginal confidence they aren’t just wholesale mining data for trade secrets - even if they are using it for model training. Anthropic I slightly trust more. Hence the premium. No one really believes at face value a Chinese hosted firm isn’t mass trolling every competitive advantage possible and handing back to the government and other cross competitive firms - even if they aren’t the historical precedent is so well established and known that everyone prices it in.

I just assume any of those companies would steal my work and wouldn't care about it.

Everything they have done so far indicates this.

Running your own is the only option unless you really trust them or unless you have the option to sue them like some big companies can.

Or if you don't really care then you can use the chineese one since it is cheaper.

What makes you trust Anthropic more than Alibaba?

  • There’s a difference between stealing for model training and direct monitoring of actionable trade secrets and corporate espionage. Anthropic and OpenAI wouldn’t do this simply because they would be litigated out of existence and criminally investigated if they did. In China it’s an expected part of the corporate and legal structure with virtually no recourse for a foreign firm and when it’s in states interest domestic either. I’m surprised you don’t realize the US has fairly strong civil, criminal, and regulatory protections in place for theft of actionable material and reuse of corporate and trade secrets, let alone copyright materials. I assure you their ToS also do not allow them to do this and that in itself is a contractual obligation you can enforce and win in court.

> For coding often quality at the margin is crucial even at a premium

That's a cryptic way to say "Only for vibe-coding quality at the margin matters". Obviously, quality is determined first and foremost by the skills of the human operating the LLM.

> No one really believes at face value a Chinese hosted firm isn’t mass trolling every competitive advantage possible

That's much easier to believe than the same but applied to a huge global corp that operates in your own market and has both the power and the desire to eat your market share for breakfast, before the markets open, so "growth" can be reported the same day.

Besides, open models are hosted by many small providers in the US too, you don't have to use foreign providers per se.

  • 1) model provider choices don’t obviate the need to make other good choices

    2) I think there is a special case for Chinese providers due to the philosophical differences in what constitutes fair markets and the regulatory and civil legal structure outside China generally makes such things existentially dangerous to do; hence while it might happen it is extraordinarily ill advised, while in China is implicitly the way things work. However my point is Ali has their own hosted version of Qwen models operating on the frontier that are at minimum hosted exclusively before released. Theres no reason to believe they won’t at some point exclusively host some frontier or fine tuned variants for purposes for commercial reasons. This is part of why they had recent turnover.

> For coding often quality at the margin is crucial even at a premium

For some problems, sure, and when you are stuck, throwing tokens at Opus is worthwhile.

On the other hand, a $10/month minimax 2.7 coding subscription that literally never runs out of tokens will happily perform most day-to-day coding tasks

Most code is not P99 though.

Also, have you considered that your trust in Anthropic and distrust in China may not be shared by many outside the US? There's a reason why Huawei is the largest supplier of 5G hardware globally.

  • You're right, but perspective is important, and that's because China and the US are engaged in economic warfare (even before the current US regime), vying for the dubious title of "superpower".

  • I find it hard to believe anyone who has ever done business inside China doesn’t know that the structure of Chinese business is built around massive IP theft and repurposing on a state wide systematic level. It’s not a nationalism point, it’s an objective and easily verified truth.

    Most code is not P99, but companies pay a premium to produce code that is. That’s my point.

Given the very limited experience I have where I've been trying out a few different models, the quality of the context I can build seems to be much more of an issue than the model itself.

If I build a super high quality context for something I'm really good at, I can get great results. If I'm trying to learn something new and have it help me, it's very hit and miss. I can see where the frontier models would be useful for the latter, but they don't seem to make as much difference for the former, at least in my experience.

The biggest issue I have is that if I don't know a topic, my inquiries seem to poison the context. For some reason, my questions are treated like fact. I've also seen the same behavior with Claude getting information from the web. Specifically, I had it take a question about a possible workaround from a bug report and present it as a de-facto solution to my problem. I'm talking disconnect a remote site from the internet levels of wrong.

From what I've seen, I think the future value is in context engineering. I think the value is going to come from systems and tools that let experts "train" a context, which is really just a search problem IMO, and a marketplace or standard for sharing that context building knowledge.

The cynic in me thinks that things like cornering the RAM market are more about depriving everyone else than needing the resources. Whoever usurps the most high quality context from those P99 engineers is going to have a better product because they have better inputs. They don't want to let anyone catch up because the whole thing has properties similar to network effects. The "best" model, even if it's really just the best tooling and context engineering, is going to attract the best users which will improve the model.

It makes me wonder of the self reinforced learning is really just context theft.

> This is why the marginal difference between your median engineer and your P99 engineer is comp is substantial, while the marginal comp difference between your median pick and packer vs your P99 pick and packer isn’t.

That's an interesting analogy.

Not sure how your last point matters if 27b can run on consumer hardware, besides being hosted by any company which the user could certainly trust more than anthropic.

OpenAI & Anthropic are just lying to everyone right now because if they can't raise enough money they are dead. Intelligence is a commodity, the semiconductor supply chain is not.

  • The challenge is token speed. I did some local coding yesterday with qwen3.6 35b and getting 10-40 tokens per second means that the wall time is much longer. 20 tokens per second is a bit over a thousand tokens per minute, which is slower than the the experience you get with Claude Code or the opus models.

    Slower and worse is still useful, but not as good in two important dimensions.

    • Also benchmark measures are not empirical experience measures and are well gamed. As other commenters have said the actual observed behavior is inferior, so it’s not just speed.

      It’s ludicrous to believe a small parameter count model will out perform a well made high parameter count model. That’s just magical thinking. We’ve not empirically observed any flattening of the scaling laws, and there’s no reason to believe the scrappy and smart qwen team has discovered P=NP, FTL, or the magical non linear parameter count scaling model.

> but if you’re not a Chinese shop, would you really host your production code development workload on a Chinese hosted provider?

As opposed to an US-american shop? Yup, sure, why not? It's the same ballpark.

> For coding often quality at the margin is crucial even at a premium.

For coding, quality is not measurable and is based entirely on feels (er, sorry, "vibes").

Employers paying for SOTA models is nothing but a lifestyle status perk for employees, like ping-pong tables or fancy lunch snacks.

  • I’m building my own company and I consider model choice crucial to my marginal ability to produce a higher quality product I don’t regret having built. Every higher end dev shop I’ve worked at over the last few years perceives things the same. There are measurable outcomes from software built well and software not, even if the code itself isn’t easily measurable. I would rather pay a few thousand more per year for a better overall outcome with less developer struggle against bad model decisions than end up with an inferior end product and have expensive developer spin wheels containing a dumb as a brick model. But everyone’s career experiences are different and I’d feel sad to work at a place where SOTA is a lifestyle choice rather than a rational engineering and business choice.

  • "based entirely on feels"

    Now there's a word I haven't heard in a long, long time.