Code Mode: give agents an API in 1k tokens

8 days ago (blog.cloudflare.com)

THe composable executable tool pattern is great, but hiding all details behind "search" is no panacea. Without any capability description in the context, LLMs won't necessarily know they SHOULD search for something in the tool API... so it's a tradeoff.. The real challenge is figuring out how to give them context-aware tool documenation. Maybe if that's built into the model somehow with a retrained model, or if it knows broadly enough about cloudflare (or any provided sdk)

Code Mode is a workaround for poor tool packaging.

Instead of improving the MCP ecosystem’s capability discovery model, they reduced context pressure by collapsing the tool ecosystem into a programmable backend.

It’s a vendor-specific, host-opaque, low-observability MCP-exposed programmable API gateway with weak MCP-layer governance semantics.

This is a brilliant formalization of a pattern we arrived at a while ago while building Tako, an open-source Okta AI agent.

Even just exposing ~107 read-only GET endpoints was blowing out our context windows. We realized we couldn't load the spec; we had to let the agent query it.

We built a two-step discovery pattern: the agent requests a lightweight index of operation names (system_log.list_events, user.list, etc.), then requests the full spec only for the operations it actually needs. The full spec never hits the context window.

The key difference from querying a raw OpenAPI spec is the notes field. Each endpoint in our JSON carries a hand-written notes field with usage rules, gotchas, and agent planning hints. For example, the list-users notes tell the agent: "CRITICAL: Use search for SCIM filtering, use filter for limited system properties, use q for simple name/email matching — mixing search and filter parameters causes errors."

An OpenAPI spec will never tell you that. It gives you parameter names and types, but not that mixing two valid parameters silently breaks, or that q excludes deprovisioned users, or that you need a follow-up call to get-user because an endpoint only returns IDs. Those are the things that make agents fail in production. They only exist in tribal knowledge or buried in docs. We put them directly in the JSON, per endpoint, where the agent reads them at query time.

The tradeoff: it's not auto-generated, so it takes effort to maintain. But the payoff is that end-users can edit the notes for their own tenant. Custom profile attribute? Add it to the notes. Want the agent to always filter by your department structure? Edit the JSON. No code changes.

To prove this pattern translates to MCP, we built an Okta MCP server with two modes — Standard Mode (every endpoint as a separate tool, immediately crushes context) and Discovery Mode (this progressive pattern): https://fctr.io/okta-mcp-server.html

The raw JSON spec: https://github.com/fctr-id/okta-ai-agent/blob/main/src/data/...

Cloudflare's server-side V8 isolate is a big improvement for security over our client-side execution. Great write-up.