Comment by toddmorey
6 hours ago
I’m actually amazed at the output since GLM doesn’t have eyes. If GLM 5.2 costs 1/5 as much, seems like it could be set up to reach out to a multimodal model for vision tasks when required. Closer to parity but probably still significantly cheaper.
I'm also very impressed at the output given the lack of image support.
They picked a task that heavily favors a model that can do multi-modal with images, and GLM still came within striking distance.
What I'm hearing from this article is that the next generation of open models that includes better multi-modal support are basically no-brainers for adoption.
Seems like a HUGE win for Z.ai and open models in general here.