Comment by thot_experiment
3 months ago
all the qwen flavors have a VL version and it's a separate tensor stack, just a bit of vram if you want to keep it resident and vision-based queries take longer to process context but generation is still fast asf
i think the model itself is actually "smarter" because they split the thinking and instruct models so both modalities become better in their respective model
i use it almost exclusively to OCR handwritten todo lists into my todo app and i don't think it's missed yet, does a great job of toolcalling everything
No comments yet
Contribute on Hacker News ↗