Comment by brianjking
3 years ago
There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
3 years ago
There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
I was wondering when someone would point this out. The api is called “rainbow” and it does not only recognition / reasoning but also generation.
It’s a very limited model for a select few.
> it does not only recognition / reasoning but also generation.
hmm this is new. source for the image generation piece?
I assume they will release this API publicly at some point?
It's amazing the extreme levels of advantage that groups have depending on funding and connections.
The multi-modal vision support? Yes. It's just temporarily available only to BeMyEyes.
For now I'm using models like Salesforce/blip2 and OVF and Meta's Segment Anything for visual questioning.
> It's amazing the extreme levels of advantage that groups have depending on funding and connections.
It's actually not.