Comment by dchu17
3 months ago
Thought about this too. I think there are two broad LLM capabilities here that are kind of currently tangled up in this eval:
1. Can an LLM navigate a slide effectively (i.e find all relevant regions of interest)? 2. Given a region of interest, can an LLM make the correct assessment?
I need to come up with a better test here in general but yep I'm thinking about this
No comments yet
Contribute on Hacker News ↗