DeepMind stunned the biology world late last year when its AlphaFold2 AI model predicted the structure of proteins (a common and very difficult problem) so accurately that many declared the decades-old problem “solved.” Now researchers claim to have leapfrogged DeepMind the way DeepMind leapfrogged the rest of the world, with RoseTTAFold, a system that does nearly the same thing at a fraction of the computational cost. (Oh, and it’s free to use.)
AlphaFold2 has been the talk of the industry since November, when it blew away the competition at CASP14, a virtual competition between algorithms built to predict the physical structure of a protein given the sequence of amino acids that makes it up. The model from DeepMind was so far ahead of the others, so highly and reliably accurate, that many in the field have talked (half-seriously and in good humor) about moving on to a new field.
But one aspect that seemed to satisfy no one was DeepMind’s plans for the system. It was not exhaustively and openly described, and some worried that the company (which is owned by Alphabet/Google) was planning on more or less keeping the secret sauce to themselves — which would be their prerogative but also somewhat against the ethos of mutual aid in the scientific world.
Alphabet’s DeepMind achieves historic new milestone in AI-based protein structure prediction
That concern seems to have been at least partly mooted by work from University of Washington researchers led by David Baker and Minkyung Baek, published in the latest issue of the journal Science. Baker, you may remember, recently won a Breakthrough Prize for his team’s work combating COVID-19 with engineered proteins.
The team’s new model, RoseTTAFold, makes predictions at similar accuracy levels using methods that Baker, responding to questions via email, candidly admitted were inspired by those used by AlphaFold2.
“The AlphaFold2 group presented several new high level concepts at the CASP14 meeting. Starting from these ideas, and with a lot of collective brainstorming with colleagues in the group, Minkyung has been able to make amazing progress in very little time,” he said. (“She is amazing!” he added.)
Baker’s group more or less placed second at CASP14, no mean feat, but hearing DeepMind’s methods described even generally set them on a collision course. They developed a “three-track” neural network that simultaneously considered the amino acid sequence (one dimension), distances between residues (two dimensions), and coordinates in space (three dimensions). The implementation is beyond complex and far outside the scope of this article, but the result is a model that achieves almost the same accuracy levels — levels, it bears repeating, that were completely unprecedented less than a year ago.
What’s more, RoseTTAFold accomplishes this level of accuracy far more quickly — that is, using less computation power. As the paper puts it:
DeepMind reported using several GPUs for days to make individual predictions, whereas our predictions are made in a single pass through the network in the same manner that would be used for a server…the end-to-end version of RoseTTAFold requires ~10 min on an RTX2080 GPU to generate backbone coordinates for proteins with less than 400 residues.
Hear that? It’s the sound of thousands of microbiologists sighing in relief and discarding drafts of emails asking for supercomputer time. It may not be easy to lay one’s hands on a 2080 these days, but the point is any high-end desktop GPU can perform this task in minutes, instead of requiring a high-end cluster running for days.
The modest requirements make RoseTTAFold suitable for public hosting and distribution as well, something that might never have been in the cards for AlphaFold2.
“We have a public server that anyone can submit protein sequences to and have the structures predicted,” Baker said. “There have been over 4500 submissions since we put the server up a few weeks ago. We have also made the source code freely available.”
This may seem very niche, and it is, but protein folding has historically been one of the toughest problems in biology and one towards which countless hours of high-performance computing have been dedicated. You may recall Folding@Home, the popular distributed computing app that let people donate their computing cycles to attempting to predict protein structures. The kind of problem that might have taken a thousand computers days or weeks to do — essentially by brute-forcing solutions and checking for fit — now can be done in minutes on a single desktop.
The physical structure of proteins is of utmost importance in biology, as it is proteins that do the vast majority of tasks in our bodies, and proteins that must be modified, suppressed, enhanced, and so on for therapeutic reasons; first, however, they need to be understood, and until November that understanding could not be reliably achieved computationally. At CASP14 it was proven to be possible, and now it has been made widely available.
It is not, by a long shot, a “solution” to the problem of protein folding, though the sentiment has been expressed. Most proteins at rest in neutral conditions can now have their structure predicted, and that has huge repercussions in multiple domains, but proteins are seldom found “at rest in neutral conditions.” They twist and contort to grab or release other molecules, to block or slip through gates and other proteins, and generally to do everything they do. These interactions are far more numerous, complex, and difficult to predict, and neither AlphaFold2 nor RoseTTAFold can do so.
“There are many exciting chapters ahead… the story is just beginning,” said Baker.
If you’re curious about the science and the potential repercussions, consider reading this much more detailed and technical account of the methods and possible next steps written in the wake of AlphaFold2’s CASP14 performance.
$3M Breakthrough Prize goes to scientist designing molecules to fight COVID-19