Comment by jdeng
5 hours ago
Glad to to see open source models are catching up and treat vision as first-class citizen (a.k.a native multimodal agentic model). GLM and Qwen models takes different approach, by having a base model and a vision variant (glm-4.6 vs glm-4.6v).
I guess after Kimi K2.5, other vendors are going to the same route?
Can't wait to see how this model performs on computer automation use cases like VITA AI Coworker.
No comments yet
Contribute on Hacker News ↗