← Back to context

Comment by brianjking

3 years ago

There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.

I was wondering when someone would point this out. The api is called “rainbow” and it does not only recognition / reasoning but also generation.

It’s a very limited model for a select few.

  • > it does not only recognition / reasoning but also generation.

    hmm this is new. source for the image generation piece?

I assume they will release this API publicly at some point?

It's amazing the extreme levels of advantage that groups have depending on funding and connections.

  • The multi-modal vision support? Yes. It's just temporarily available only to BeMyEyes.

    For now I'm using models like Salesforce/blip2 and OVF and Meta's Segment Anything for visual questioning.

  • > It's amazing the extreme levels of advantage that groups have depending on funding and connections.

    It's actually not.