Comment by segmondy

7 months ago

I do deepseek at 5tk/sec at home and I'm happy with it. I don't need to do agent stuff to gain from it, I was saving to eventually build out enough to run it at 10tk/sec, but with kimi k2, plan has changed and the savings continue with a goal to run it at 5 tk/sec at home.

7 comments

segmondy

fzzzy 7 months ago

I agree, 5 tokens per second is plenty fast for casual use.

overfeed 7 months ago
Also works perfectly fine in fire-and-forget, non-interactive agentic workflows. My dream scenario is that I create a bunch of kanban tickets and assign them to one or more AI personas[1], and wake up to some Pull Requests the next morning. I'd me more concerned about tickets-per-day, and not tk/s as I have no interest in watching the inner-workings of the model.
1. Some more creative than others, with slightly different injected prompts or perhaps even different models entirely.
- numpad0 7 months ago
  
  > I create a bunch of kanban tickets and assign them to one or more AI personas[1],
  Yeah that. Why can't we just `find ./tasks/ | grep \.md$ | xargs llm`. Can't we just write up a government proposal style document, have LLM recursively down into sub-sub-projects and back up until the original proposal document can be translated into a completion report. Constantly correcting a humongous LLM with infinite context length that can keep everything in its head doesn't feel like the right approach.
  
  2 replies →
refulgentis 7 months ago

Cosign for chat, that's my bar for usable on mobile phone (and correlates well with avg. reading speed)
SV_BubbleTime 7 months ago

It was, last year 5tk/s was reasonable. If you wanted to proof read a paragraph or rewrite some bullet points into a PowerPoint slide.
Now, with agentic coding, thinking models, a “chat with my pdf” or whatever artifacts are being called now, no, I don’t think 5/s is enough.