Comment by mt_

9 months ago

You can just dump the youtube link video in Google AI studio and ask it to transcribe the video with speaker labels and even ask it it to add useful visual clues, because the model is multimodal for video too.