Comment by traverseda

4 months ago

"The first time you've seen these objects" is a weird thing to say. One presumes that this is already in their training set, and that these models aren't storing a huge amount of data in their context, so what does that even mean?

6 comments

traverseda

jayd16 4 months ago

It probably gives them confidence that they can accurately see a thing even though they don't know what that thing is.

I could also imagine a lot of safety around leaving things outside of the current task alone so you might have to bend over backwards to get new objects worked on.

thomastjeffery 4 months ago
There is no such thing as "thing" here.
These models are trained such that the given conditions (the visual input and the text prompt) will be continued with a desirable continuation (motor function over time).
The only dimension accuracy can apply to is desirability.
- jayd16 4 months ago
  
  You don't think there's any segmentation going on?
  
  1 reply →

ygouzerh 4 months ago

So from what I understand it actually means that they were for example never trained on a video of an apple. Maybe only on a video of bread, pineapple, chocolate.

However, as it was trained using generic text data similarly to a normal LLM, it knows how an apple is supposed to look like.

Similar than a kid that never saw a banana, but his parent described it to him.

Symmetry 4 months ago

It's normal to have a training set and a validation set and I interpreted that to mean that these items weren't in the training set.