Comment by justinram11
9 hours ago
That's about what we've seen as well (even directly from deepseek themselves).
We've been using it for async "heartbeat" processing and sms replies, but it's just too slow for live chat replies (which is a shame, as I'd really love to use it there).
Very capable model, but also very slow.
That isn't what the charts on OpenRouter appear to show but they only seem to go back 1 week (unless I missed something). It should be less than 2 seconds to first token and anywhere from 15 to 50 tps depending on the provider. Admittedly 15 is a bit slow but most look to be closer to 30 or 40 which at least personally I think is fine.
https://openrouter.ai/deepseek/deepseek-v4-pro/performance
have you tried their flash model? pro was too slow for me too but I've found flash to be more than capable and it's faster than Gpt-5.5 at medium.
Actually on my list this week to take a look at putting an intelligence escalation flow MVP together (initial assumption would be that flash is good for 60-80% of my user's workflows, with only the tricky questions needing a more capable model. Whether I can put together a proper detection system is yet to be seen).