Comment by brianjking

3 years ago

There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.

5 comments

brianjking

ericlewis 3 years ago

I was wondering when someone would point this out. The api is called “rainbow” and it does not only recognition / reasoning but also generation.

It’s a very limited model for a select few.

swyx 3 years ago

> it does not only recognition / reasoning but also generation.
hmm this is new. source for the image generation piece?

ilaksh 3 years ago

I assume they will release this API publicly at some point?

It's amazing the extreme levels of advantage that groups have depending on funding and connections.

brianjking 3 years ago

The multi-modal vision support? Yes. It's just temporarily available only to BeMyEyes.
For now I'm using models like Salesforce/blip2 and OVF and Meta's Segment Anything for visual questioning.
hn_throwaway_99 3 years ago

> It's amazing the extreme levels of advantage that groups have depending on funding and connections.
It's actually not.