Comment by lawrencechen
3 months ago
I wonder if navigation plays a significant role in performance. If you just randomly select 15 frames (presumably with interesting pixels), will the model perform similarly well?
3 months ago
I wonder if navigation plays a significant role in performance. If you just randomly select 15 frames (presumably with interesting pixels), will the model perform similarly well?
Thought about this too. I think there are two broad LLM capabilities here that are kind of currently tangled up in this eval:
1. Can an LLM navigate a slide effectively (i.e find all relevant regions of interest)? 2. Given a region of interest, can an LLM make the correct assessment?
I need to come up with a better test here in general but yep I'm thinking about this