Comment by ramraj07

2 years ago

Not sure if you should be reminded of how alpha fold started, it started by winning a competition thought un winnable by academics. Top labs working in protein structure prediction have fundamentally changed direction after alpha fold and are working to do the same even better.

This is not the first (or even tenth) time I’m seeing an academic trying to undermine genuine progress almost to the level of gaslighting. Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

Not sure what else to say. Structural biology has always been the weirdest field I’ve seen, the way students are abused (crystallize and publish in nature or go bust), and how every nature issue will have three structure papers as if that cures cancer every day. I suppose it warps one’s perception of outsiders after being in such a bubble?

signed, someone with a PhD in biomedical engineering, did a ton of bio work.

> Not sure if you should be reminded of how alpha fold started, it started by winning a competition thought un winnable by academics. Top labs working in protein structure prediction have fundamentally changed direction after alpha fold and are working to do the same even better.

Not sure what part of "it does homology modeling 2x better" you didn't see in my comment? AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER which is exactly what I said in my comment.

>This is not the first (or even tenth) time I’m seeing an academic trying to undermine genuine progress almost to the level of gaslighting. Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

It literally is homology modeling. The deep learning aspect is to boost otherwise unnoticed signal that most homology modeling software couldn't tease out. Also, I don't think I'm gaslighting, but maybe I'm wrong? If anything, I felt gaslit by the language around AlphaFold.

>Not sure what else to say. Structural biology has always been the weirdest field I’ve seen, the way students are abused (crystallize and publish in nature or go bust), and how every nature issue will have three structure papers as if that cures cancer every day. I suppose it warps one’s perception of outsiders after being in such a bubble?

What on earth are you even talking about? The vast, VAST majority of structures go unpublished ENTIRELY, let alone published in nature. There are almost 200,000 structures on deposit in the PDB.

  • What ramraj is talking about: if you go into a competitive grad program to get a PhD in structural biology, your advisor will probably expect that in 3-4 years you will: crystallize a protein of interest, collect enough data to make a model, and publish that model in a major journal. Many people in my program could not graduate until they had a Nature or Science paper (my advisor was not an asshole, I graduated with just a paper in Biochemistry).

    In a sense both of you are right- DeepMind is massively overplaying the value of what they did, trying to expand its impact far beyond what they actually achieved (this is common in competitive biology), but what they did was such an improvement over the state of the art that it's considered a major accomplishment. It also achieved the target of CASP- which was to make predictions whose scores are indistinguishable from experimentally determined structures.

    I don't think academics thought CASP was unwinnable but most groups were very surprised that an industrial player using 5 year old tech did so well.

  • > Not sure what part of "it does homology modeling 2x better" you didn't see in my comment? AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER which is exactly what I said in my comment.

    Wait, stop, I don't know anything about proteins but 84% success is not ~2x better than 42%.

    It doesn't really make sense to talk about 2x better in terms of success percentages, but if you want a feel, I would measure 1/error instead (a 99% correct system is 10 times better than a 90% correct system), making AlphaFold around 3.6 times better.

    • I think odds ratio ( p/(1-p) ) is the thing I'd use here. It gives the right limiting behavior (at p ~= 0, doubling p is twice as good, and at p~=1, halving 1-p is twice as good) and it's the natural way to express Bayes rule, meaning you can say "I'm twice as sure (in odds ratio terms) based on this evidence" and have that be solely a property of the update, not the prior.

      3 replies →

  • > AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER

    As someone who doesn't know proteins, but is decent at math, I would not describe it this way. You are assuming a linear relationship between effort and value, but more often than not, effort has diminishing returns. 80dB is not 2x as loud as 40 dB. An 8K image doesn't have 2x the fidelity of a 4K image. If Toyota unveiled a new engine that was 60% efficient tomorrow, no one in their right mind would say "eh, it's just 2x better". If we came out with a CPU that could clock up to 10Ghz we wouldn't say "meh, that's just 2x what we had".

    Without being able to define the relationship here, I could just as well say that 85% is 1000x better than 42%. There's just no way to put a number on it. What we can say is that we completely blew all projections out of the water.

    Again, I'm not someone working with proteins, but to me it sounds as revolutionary as a 60%+ efficient engine, or a 10Ghz CPU. No one saw it coming or thought it feasible with current technology.

  • I think the debate between "does amazing on metric X" versus "doesn't really understand the problem" reappears many places and doesn't have any direct way to be resolved.

    That's more or less because "really understands the problem" generally winds-up being a placeholder for things the system can't. Which isn't to say it's not important. One thing that is often included in "understanding" is the system knowing the limits of its approach - current AI systems have a harder time giving a certainty value than giving a prediction. But you could have a system that satisfied a metric for this and other things would pop up - for example, what kind of certainty or uncertainty are we talking about (crucial for decision making under uncertainty).

> Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

It's really not - have you played around with AF at all? Made mutations to protein structures and asked it to model them? Go look up the crystal structures for important proteins like FOXA1 [1], AR [2], EWSR1 [3], etc (i.e. pretty much any protein target we really care about and haven't previously solved) and tell me with a straight face that AF has "solved" protein folding - it's just a fancy language model that's pattern matching to things it's already seen solved before.

signed, someone with a PhD in biochemistry.

[1] https://alphafold.ebi.ac.uk/entry/P55317 [2] https://alphafold.ebi.ac.uk/entry/P10275 [3] https://alphafold.ebi.ac.uk/entry/Q01844

  • I can see the loops in these structures. I dont see the problem. It still added a structure to every embl page, and people are free to judge the predictions themselves. For all I care (ostensibly as the end customer of these structures) I don’t mind having a low confidence structure for any arbitrary protein at all. It’s only marginally less useful to actual biology than full on X-ray structures anyway.

    • > It’s only marginally less useful to actual biology than full on X-ray structures anyway.

      I'm not sure what you're implying here. Are you saying both types of structures are useful, but not as useful as the hype suggests, or that an X-Ray Crystal (XRC) and low confidence structures are both very useful with the XRC being marginally more so?

      An XRC structure is great, but it's a very (very) long way from getting me to a drug. Observe the long history of fully crystalized proteins still lacking a good drug. Or this piece on the general failure of purely structure guided efforts in drug discovery for COVID (https://www.science.org/content/blog-post/virtual-screening-...). I think this tech will certainly be helpful, but for most problems I don't see it being better than a slightly-more-than-marginal gain in our ability to find medicines.

      Edit: To clarify, if the current state of the field is "given a well understood structure, I often still can't find a good medicine without doing a ton of screening experiments" then it's hard to see how much this helps us. I can also see several ways in which a less than accurate structure could be very misleading.

      FWIW I can see a few ways in which it could be very useful for hypothesis generation too, but we're still talking pretty early stage basic science work with lots of caveats.

      Source: PhD Biochemist and CEO of a biotech.

This isn’t a good use of the term gaslighting. Accusing someone of gaslighting takes what we used to call a ‘difference of opinion’ and mutates it into deliberate and wicked psychological warfare.

Incidentally, accusing someone of gaslighting is itself a form of gaslighting.

  • Well, it can be gaslighting but not always. A knowingly false accusation, repeated often enough and in a way to make the accused question their own perception of reality, would be gaslighting.

Not only is CASP not "unwinnable," it's not even a contest. The criteria involved are rated as "moderately difficult." Alphafold is a significant achievement but it sure as hell hasn't "revealed the structure of the protein universe," whatever that means.

Which top labs have changed direction? Because Alphafold can't predict folds, just identify ones it's seen.