← Back to context

Comment by TeMPOraL

9 hours ago

That's why the harness is moving server-side: because generating tokens is not the actual point of the model, not for the users. Especially with tool calling giving us agents that can act, most of the tokens generated are not, themselves, critical to the end users. Specifically, a lot of tokens goes into orchestrating actual tool calls, and then most "thinking tokens" are only relevant to users only in so far as they help users keep track of and verify what the LLM is doing. So all those tokens can be hidden or replaced by partial summaries, and all of that can happen server-side, and then there's very little to distill from.

I haven't heard of this happening, do you have links any explainers on this?

  • Claude on the Web (which includes also at least the Android and Desktop apps) and ChatGPT web app are two examples - they keep gaining agentic capabilities.

    Perhaps most striking example for me - I've been using a lot of Claude Code in the past month, most of it was through the web, Desktop (app) or phone interface, running actual harness "remotely" (somewhere on Anthropic-controlled infra).

    One way of looking at it: web surfaces are slowly catching up with (fraction of the power of) agentic coding tools. But another way is, the major players are building up SaaS harnesses that start to compete with (their own) local ones. The reason may be ease of use, but the practical side effect is making it much harder to use their models to train competition, as these SaaS harnesses create an abstraction layer on top of LLMs that resides entirely in the vendor's cloud and therefore cannot be worked around.