Comment by skybrian

4 hours ago

No, I mean that dodgy Chinese firms are cheating their customers:

> Because users’ inputs and model outputs are mediated through a proxy, users cannot verify which model their request was actually routed to. A user selects Opus 4.7, but the proxy can silently route to Sonnet, Haiku, or, in the worst case, GLM or Qwen, and fraudulently relabel the output. In a recent paper from Germany’s CISPA Helmholtz Center for Information Security (which cited my article last year on grey market!), researchers audited 17 API proxies and found widespread model swapping–API proxy access to “Gemini-2.5” achieved only 37.00% on a medical benchmark, a staggering drop from the 83.82% performance of the official API. On the user end, the tell only comes on complex tasks, when the output feels off (often referred to as 降智, or “dumbed-down”), but there is no clean way to prove it. Numerous public records highlight concerns that certain API proxies have noticeably compromised model performance. These proxies are suspected of “diluting” (掺水) services by substituting premium frontier models with inferior tiers.

> Besides model swapping, overconsumption of tokens also makes the price per token cheaper, though at the expense of driving up the total cost. Some of it is structural, as proxies that rotate accounts frequently destroy cache continuity as a side effect, forcing users to burn full-price tokens on context that would otherwise be nearly free. Some of it may be deliberate as the proxy providers try to milk more usage. The line between the two is difficult to draw from the outside.