Comment by fpgaminer

14 days ago

https://www.llama.com/ https://www.llama.com/docs/model-cards-and-prompt-formats/ll...

Very exciting. Benchmarks look good, and most importantly it looks like they did a lot of work improving vision performance (based on benchmarks).

The new suggested system prompt makes it seem like the model is less censored, which would be great. The phrasing of the system prompt is ... a little disconcerting in context (Meta's kowtowing to Nazis), but in general I'm a proponent of LLMs doing what users ask them to do.

Once it's on an API I can start throwing my dataset at it to see how it performs in that regard.

Alright, played with it a little bit on the API (Maverick). Vision is much better than Llama 3's vision, so they've done good work there. However its vision is not as SOTA as the benchmarks would indicate. Worse than Qwen, maybe floating around Gemini Flash 2.0?

It seems to be less censored than Llama 3, and can describe NSFW images and interact with them. It did refuse me once, but complied after reminding it of its system prompt. Accuracy of visual NSFW content is not particularly good; much worse than GPT 4o.

More "sensitive" requests, like asking it to guess the political affiliation of a person from an image, required a _lot_ of coaxing in the system prompt. Otherwise it tends to refuse. Even with their suggested prompt that seemingly would have allowed that.

More extreme prompts, like asking it to write derogatory things about pictures of real people, took some coaxing as well but was quite straight-forward.

So yes, I'd say this iteration is less censored. Vision is better, but OpenAI and Qwen still lead the pack.