← Back to context

Comment by bbminner

6 months ago

Looking at Molmo description:

"Our key innovation is a new collection of datasets called PixMo that includes a novel highly-detailed image caption dataset collected entirely from human annotators using speech-based descriptions, and a diverse mixture of fine-tuning datasets that enable new capabilities. Notably, PixMo includes innovative 2D pointing data that enables Molmo to answer questions not just using natural language but also using non verbal cues. We believe this opens up important future directions for VLMs enabling agents to interact in virtual and physical worlds. The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and most critically the quality of our newly collected datasets, all of which we have released."

This is a solid engineering project with a research component - they collected some data that ended up being quite useful when combined with pre-existing tech. But this is not rocket science and not a unique insight. And I don't want to devalue the importance of solid engineering work, but you normally don't get paid as much for non-unique engineering expertise. This by no means sounds unique to me. This seem like a good senior-staff research eng project in a big tech company these days. You don't get paid 250M for that kind of work. I know very talented people who do this kind of work in big tech, and from what I can tell, many of them appear to have much more fundamental insight and experience, and led larger teams of engineers, and their comp does not surpass 1-2M tops (taking a very generous upper bound).