Comment by bobxmax
2 days ago
I thought Alex Wang was a very curious choice. There are so many foundational AI labs with interesting CEOs... I get that Wang is remarkable in his own right, but he basically just built MTurk and timed the bubble.
Doesn't really scream CEO of AGI to me.
A lot of people also don't know that many of the well known papers are just variations on small time papers with a fuck ton more compute thrown at the problem. Probably the strongest feature that correlates to successful researcher is compute. Many have taken this to claim that the GPU poor can't contribute but that ignores so many other valid explanations... and we wonder why innovation has slowed... It's also weird because if compute was all you need then there's a much cheaper option than Zuck paid. But he's paying for fame.
> A lot of people also don't know that many of the well known papers are just variations on small time papers with a fuck ton more compute thrown at the problem.
I worked for a small research heavy AI startup for a bit and it was heart breaking how many people I would interact with in that general space with research they worked hard and passionately on only to have been beaten to the punch by a famous lab that could rush the paper out quicker and at a larger scale.
There were also more than a few instances of high-probability plagiarism. My team had a paper that had been existing for years basically re-written without citation by a major lab. After some complaining they added a footnote. But it doesn't really matter because no big lab is going to have to defend themselves publicly against some small startup, and their job at the big labs is to churn out papers.
This added at least a year to my PhD... Reviewers kept rejecting my works saying "add more datasets" and such comments. That's nice and all, but on the few datasets I did use I beat out top labs and used a tenth of the compute. I'd love to add more datasets but even though I only used a tenth of the compute I blew my entire compute budget. Guess state of the art results, a smaller model, higher throughput, and 3rd party validation were not enough (use an unpopular model architecture).
I always felt like my works were being evaluated as engineering products, not as research.
I was reviewing a work once and I actually couldn't tell if the researchers knew that they ripped me off or not. They compared to my method, citing, and showing figures using it. But then dropped the performance metrics from the table. So I asked. I got them in return and saw that there was no difference... So I dove in and worked out that they were just doing 99% my method with additional complexity (computational overhead). I was pretty upset.
I was also upset because otherwise the paper was good. The results were nice and they even tested our work in a domain we hadn't. Were they just upfront I would have gladly accepted the work. Though I'm pretty confident the other reviewers wouldn't have due to "lack of novelty."
It's a really weird system that we've constructed. We're our own worst enemies.
I'd modify this slightly. Their job is to get citations. Churning out papers really helps with that, but so does all the tweeting and evangelizing of their works. It's an unfortunate truth that as researchers we have to sell our works, and not just by the scientific merit that they hold. People have to read them after all. But we should also note that it is easier for some groups to get noticed more than others. Prestige doesn't make a paper good, but it sure acts as a multiplying factor for all the metrics we use for determining if it is good.
It’s funny.
I learnt the hard way that communications/image/signal processing research basically doesn’t care about Computer Architecture at the nuts and bolts level of compiler optimization and implementation.
When they encounter a problem whose normal solution requires excessive amounts of computation, they reduce complexity algorithmically using mathematical techniques, and quantify the effects.
They don’t quibble about a 10x speed up, they reduce the “big O()” complexity. They could care less whether it was implemented in interpreted Python or hand-optimized assembly code.
On one hand, I know there’s a lot of talent in AI today. But throwing hardware at the problem is the dumbest way forward.
WiFI adapters would be wheeled luggage if we had the same mentality during their development.
At some point it becomes difficult to improve the O() complexity. How do you do better that the O(n-squared) of the Transformer, with acceptable tradeoffs? Many big brains in all the big labs are very aware of the importance of algorithmic advances. There is no low hanging fruit, but they're doing their best.
Then in parallel to that looking at compiler optimizations, and other higher-level algorithmic innovations such as Flash Attention (a classic at this point) which had a drastic impact on performance due to cache awareness, without changing the O() complexity.
1 reply →
I can at least say that's not all of us. But you're probably right that this is dominating. I find it so weird since everyone stresses empirics yet also seems to not care about them. It took me my entire PhD to figure out what was really going on. I've written too many long winded rants on this site though
You make it sound like reducing the big O complexity is a dumb thing to do in research, but this is really the only way to make lasting progress in computer science. Computer architectures become obsolete as hardware changes, but any theoretical advances in the problem space will remain true forever.
1 reply →
Frankly this is the reason why Im not convinced the current movement of LLMs will yield anything close to the dream.
The right people to deliver immense progress dont exist right now.
I wouldn't go this far. But I would say that we're not giving them a good shot.
The people are always there, you just need to find them and enable them.
The reportings at the time said that he was Mark’s 5th choice or similar. It is fairly clear he would prefer Ilya, Murati, Mark Chen, and perhaps others, but they said no, and Alex Wang was the first one to say yes.
The self-supervised mesa-optimizer strikes again
Why in the world would he want Murati? She has absolutely no technical chops and was not functionally CTO of OpenAI.
> was not functionally CTO of OpenAI.
Why do you say that?
4 replies →
Because she was CTO of OpenAI.
1 reply →
What technical chops does Sam Altman have?
3 replies →
Alexandr Wang is not interesting and a few steps short of a fraud that Mark had to bail out because he was so co invested.
Shareholders should be livid if they knew a single thing about what was going on.
Tell me more
Scale promised cutting-edge data pipelines and model-training infra but mostly sold outsourced labeling with a tech veneer. Great margins, weak moat — classic Valley overclaim, not outright fraud.