Comment by fpgaminer
14 days ago
Alright, played with it a little bit on the API (Maverick). Vision is much better than Llama 3's vision, so they've done good work there. However its vision is not as SOTA as the benchmarks would indicate. Worse than Qwen, maybe floating around Gemini Flash 2.0?
It seems to be less censored than Llama 3, and can describe NSFW images and interact with them. It did refuse me once, but complied after reminding it of its system prompt. Accuracy of visual NSFW content is not particularly good; much worse than GPT 4o.
More "sensitive" requests, like asking it to guess the political affiliation of a person from an image, required a _lot_ of coaxing in the system prompt. Otherwise it tends to refuse. Even with their suggested prompt that seemingly would have allowed that.
More extreme prompts, like asking it to write derogatory things about pictures of real people, took some coaxing as well but was quite straight-forward.
So yes, I'd say this iteration is less censored. Vision is better, but OpenAI and Qwen still lead the pack.
No comments yet
Contribute on Hacker News ↗