Comment by simonw
3 days ago
I wonder if Gemini 3 Pro would do better at this particular test? They're very proud of its spatial awareness and vision abilities.
3 days ago
I wonder if Gemini 3 Pro would do better at this particular test? They're very proud of its spatial awareness and vision abilities.
>They're very proud of its spatial awareness and vision abilities.
Suuuuuuuuure they are.
I haven't found a single multimodal model, vision LLM, or any model at all that can segment and extract music charts/infographics.
Can Gemini 3 Pro, in one shot, turn charts like these into lists of "artist - album" without choking on the visuals?
https://reddit.com/r/citypop/comments/10fu1t5/city_pop_album...
https://reddit.com/r/indieheads/comments/173o33z/the_new_ind...
Might work if you set media resolution to high: https://ai.google.dev/gemini-api/docs/media-resolution
I tried it on one of the images, how did it do? https://aistudio.google.com/app/prompts?state=%257B%2522ids%...
I don't trust the AI Studio "share" links so here's the image I used: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
And the prompt:
I ran that against Gemini 3 Pro Preview with media resolution set to "high"
Here's the result: