← Back to context

Comment by xienze

15 hours ago

I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.

if it's just the coding agent system prompt and tools, you can cache that

  • Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.