Comment by zozbot234
9 hours ago
Have you tried it? It would be slow for sure, but the main limitation AIUI would actually be storing the context in RAM - models like Kimi and GLM have high demands there which limit your ability to get meaningful aggregate throughput via large batches.
No comments yet
Contribute on Hacker News ↗