Comment by ilaksh
3 years ago
Actually what he has said is that the biggest performance gains were from the human feedback reinforcement learning.
There are also all of the quantization and other tricks out there.
Also they have demonstrated that the model already understands images but just haven't completed the API for this.
So they use quantization to increase the speed by a factor of 3 while slightly increasing the parameter count. Maybe find a way to make the network more sparse and efficient so in the end with the quantization the model actually uses significantly less memory. and continue with the RHLF focusing on even more difficult tasks and those that incorporate visual data.
Then instead of calling it GPT-5 they just call it GPT-4.5. Twice as fast as GPT-4, IQ goes from 130 to 155. And the API now allows images to be passed in and analyzed.
There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
I was wondering when someone would point this out. The api is called “rainbow” and it does not only recognition / reasoning but also generation.
It’s a very limited model for a select few.
> it does not only recognition / reasoning but also generation.
hmm this is new. source for the image generation piece?
I assume they will release this API publicly at some point?
It's amazing the extreme levels of advantage that groups have depending on funding and connections.
The multi-modal vision support? Yes. It's just temporarily available only to BeMyEyes.
For now I'm using models like Salesforce/blip2 and OVF and Meta's Segment Anything for visual questioning.
> It's amazing the extreme levels of advantage that groups have depending on funding and connections.
It's actually not.