Comment by janalsncm

12 hours ago

This model says it accepts video inputs. I asked it to transcribe a 5 second video of a digital water curtain which spelled “Boo Happy Halloween”, and it came back with “Happy” which wasn’t the first frame, but also is incomplete.

This kind of test is good because it requires stitching together info from the whole video.

2 comments

janalsncm

aabhay 12 hours ago

It reads videos at 1fps by default. You have to set the video resolution to high in ai studio

janalsncm 8 hours ago

This is inside the Gemini app.