← Back to context

Comment by dchu17

3 months ago

Thought about this too. I think there are two broad LLM capabilities here that are kind of currently tangled up in this eval:

1. Can an LLM navigate a slide effectively (i.e find all relevant regions of interest)? 2. Given a region of interest, can an LLM make the correct assessment?

I need to come up with a better test here in general but yep I'm thinking about this