← Back to context

Comment by segmondy

7 months ago

I do deepseek at 5tk/sec at home and I'm happy with it. I don't need to do agent stuff to gain from it, I was saving to eventually build out enough to run it at 10tk/sec, but with kimi k2, plan has changed and the savings continue with a goal to run it at 5 tk/sec at home.

I agree, 5 tokens per second is plenty fast for casual use.

  • Also works perfectly fine in fire-and-forget, non-interactive agentic workflows. My dream scenario is that I create a bunch of kanban tickets and assign them to one or more AI personas[1], and wake up to some Pull Requests the next morning. I'd me more concerned about tickets-per-day, and not tk/s as I have no interest in watching the inner-workings of the model.

    1. Some more creative than others, with slightly different injected prompts or perhaps even different models entirely.

    • > I create a bunch of kanban tickets and assign them to one or more AI personas[1],

      Yeah that. Why can't we just `find ./tasks/ | grep \.md$ | xargs llm`. Can't we just write up a government proposal style document, have LLM recursively down into sub-sub-projects and back up until the original proposal document can be translated into a completion report. Constantly correcting a humongous LLM with infinite context length that can keep everything in its head doesn't feel like the right approach.

      2 replies →

  • Cosign for chat, that's my bar for usable on mobile phone (and correlates well with avg. reading speed)

  • It was, last year 5tk/s was reasonable. If you wanted to proof read a paragraph or rewrite some bullet points into a PowerPoint slide.

    Now, with agentic coding, thinking models, a “chat with my pdf” or whatever artifacts are being called now, no, I don’t think 5/s is enough.