Comment by traverseda
2 days ago
"The first time you've seen these objects" is a weird thing to say. One presumes that this is already in their training set, and that these models aren't storing a huge amount of data in their context, so what does that even mean?
It probably gives them confidence that they can accurately see a thing even though they don't know what that thing is.
I could also imagine a lot of safety around leaving things outside of the current task alone so you might have to bend over backwards to get new objects worked on.
There is no such thing as "thing" here.
These models are trained such that the given conditions (the visual input and the text prompt) will be continued with a desirable continuation (motor function over time).
The only dimension accuracy can apply to is desirability.
You don't think there's any segmentation going on?
1 reply →
So from what I understand it actually means that they were for example never trained on a video of an apple. Maybe only on a video of bread, pineapple, chocolate.
However, as it was trained using generic text data similarly to a normal LLM, it knows how an apple is supposed to look like.
Similar than a kid that never saw a banana, but his parent described it to him.
It's normal to have a training set and a validation set and I interpreted that to mean that these items weren't in the training set.