comments (10)

  • None of this seems particularly surprising to someone who was an undergraduate level of biochemistry knowledge. Thirty years ago the professor in my Proteins class made a few relevant important points in his lectures:

    1) Only handful of amino acids in a enzyme structures were highly conserved. (Out of hundreds, generally less than ten.)

    2) Those were generally in the reaction center.

    3) Almost all single sequence replacements had no measurable effect on protein structure and function.

    4) Across species the "same" protein can diverge in sequence by up to 40%, while keeping the same structure. Sometimes this goes as far as 80%.

    Given these basic facts, the findings in the paper aren't really surprising to anyone who studies proteins.

    [Note: As with everything in biology, you can find counter examples. The histone proteins involved in DNA packing have an incredibly conserved sequence.]

    jyounker

  • Evolution discovered a bunch of structural patterns at different layers (fragments, folds..) that are energetically favorable, versatile, easily foldable, robust to mutations and then kept reusing them. As a result it sampled more and more in these parts of the space. That's why the fold space is uneven.

    Are there any folds and patterns that evolution evolution has not discovered that are also useful? I think Baker Group created a bunch of new folds. I'm not sure if they are as useful as the one discovered by Evolution. After all, Evolution had more compute power than us.

    resiros

  • This approach is pretty much like the TED approach from a few years back. As far as I remember there wasn’t a ridiculous amount of fold diversity there either. It turns out evolution isn’t averse to a bit of liberal protein plagiarism.

    https://www.science.org/doi/10.1126/science.adq4946

    hirenj

  • This does reveal the weakness of AlphaFold approaches for answering questions like “what is possible in the protein folding space if you use the 20 canonical amino acids” since the data used to train AlphaFold is limited to existing experimentally determined protein structures.

    We don’t even know if this is like body plans (four legs for mammals, why not six?) i.e. is this about physical limitations of the folding space (did evolution explore most of the space and hold onto the most useful folds, or are the common set of folds one of those accident-of-history results?). Then there’s the issue that folding takes place as the protein chain exits the ribosomal tunnel so that’s a whole other constraint on what kinds of folds might be selected. For that matter, why not other genetically determined complex amino acids instead of just the canonical set?

    Also, a common evolutionary process in eukaryotes is duplication of protein sequences and shuffling of code blocks which might represent folding domains, which might tend to lock in the existing collection of folds rather than generating novel folds. That’s not so clear.

    This weakness of AlphaFold has some modern practical relevance since non-canonical amino acids and modified proteins are increasingly used medically, and their structures mostly seem to be determined using the direct experimental methods, eg:

    https://pmc.ncbi.nlm.nih.gov/articles/PMC10296201/

    “Non-Canonical Amino Acids as Building Blocks for Peptidomimetics: Structure, Function, and Applications” (2023)

    photochemsyn

  • cool post! it's funny how many things in this world are naturally graphs. i think it's neat how, especially in biology, a lot of high-dimensional objects, like protien sequences, converge onto lower-dimensional representations, like protein structures.

    i did neuroscience for grad school, and i was always amazed by how often complex neural activity could be well represented by lower dimensional representations--clean manifolds, attractor dynamics, etc. i think, in general, biology (evolution) doesn't penalize against redundancy too hard (hence things like genetic drift, neutral theory of evolution, etc.).

    anyway, super cool stuff. agree with you that probs more useful to explore the search space via 'less natural' structures, given how forgiving evolution is to redundancy. probs where the most information can be found

    h_a_n_k

  • Proteins are truly amazing. I've studied them for decades and they still manage to surprise; for example, i worked with protein structural prediction for decades and assumed that structure was necessary for function, but some proteins remain mostly unfolded and still carry out complex mechanistic tasks.

    dekhn

  • My PhD thesis addressed a similar question. I did a survey of sub-domain sized fragments shared between different protein folds. It turns out that there are plenty, even among folds considered evolutionarily distant.

    flobosg

  • I worked with a foodie who was also a protein scientist (https://scienceandfooducla.wordpress.com/2016/02/23/kent-kir...) and he once pointed out: nearly everything you need to know about protein folding, you can learn from an egg.

    dekhn

  • No real clue what this stuff is about, way over my head, but kudos on an article where it's all there on the page instead of needing scripts to pull text and images from different places!

    ifh-hn

  • This crashed my browser. Use reader mode.

    throwaway81523