Protein folding: AI’s new frontier
Over the last few years, we’ve seen Artificial Intelligence (AI) move forward in leaps and bounds, from self-driving cars to actual near sentience. This year we have a possible field frontrunner with a solution to a prominent, complex and life defining problem: why and how does a protein actually fold?
In the 1950s, a pioneer of biochemistry, Dr Christian Anfinsen, carried out research that led to our current understanding of proteins. He investigated how the amino acid sequence – the building blocks of protein encoded in our DNA – is responsible for how a protein folds and what it folds into. However, the details of the process is far harder to discover, and requires investigation of vast 3D shapes and configurations. It’s been suggested that if we fully understood it, we would understand biological life itself.
Some researchers have devoted their entire careers to solving this problem, and strides have been made. One area of knowledge which has developed is how a protein’s shape determines it’s function. However, there is thought to be more than 200 million proteins, and we have only confirmed the structure of about 170,000 – only 8.5% of all known proteins. These are laid out in the Public DataBase (PDB).
Computational chemistry, which uses computer simulations to solve chemical problems, is nothing new. It’s helps us to map data, determine structure and even model the universe. Applying it to proteins was a logical next step, which is where Deep Mind came in.
Deep Mind is a company that a lot of our tech-savvy readers (and some conspiracy theorists) will be familiar with. The company is well known for its ‘man vs. machine’ programmes that enables AI to defeat chess masters, and old school Atari wizards. Alongside the fun, the ‘Deep-Learning program’ has been using gaming and strategy to get ready to tackle tangible, complex problems.
The AI system ‘Alpha Fold’ was trained using the PDB, running its sequences and shapes over a number of weeks to find correlations between the sequences and structures. This allows for predictions to be made about the structure of other unknowns based on all the available data. It does this by using ‘neural networks’ to compare the analytic sequence against the data bases. This in turn creates physical productions about the distance and angles of the molecules that is then scored and a structure is proposed.
‘Alpha Fold’ has well and truly shaken up the science world. The ‘Critical Assessment of Protein Structure Prediction’ (CASP) seeks to “help advance the methods of identifying protein structure from sequence” by providing “an objective testing of these methods”. They set a challenge of solving 100 amino acid chains. The standard score for experimental methods is 90/100, but Alpha Fold completely dominated the competition, gaining a median score of 92.5/100. However, when dealing with harder, massively more complex this fell to 87, yet still besting all current models and programmes. This extraordinary outcome has led to ‘Alpha Fold’ being used to solve all sorts of decade-old problems including those of developmental biology and Alzheimer’s treatment.
For some in the science community, this is a cause for concern. They argue that whilst this AI has advanced the field by decades, there is still a lot we don’t know. In addition, to improve and confirm the accuracy of the software, we need practical experimentation, but the AI simultaneously suggests we don’t need to experiment as much.
Whilst we can make reasonable predictions based on data, it doesn’t actually tell us why things happen. Asking ‘why?’ is arguably a summary of science itself.
However, there is no doubt that this software has thrust forward the entire field of biochemistry, and this means of prediction is bound to give us insight into cause and effect. With the advancement of AI, we are even closer to understanding the fundamental fabric of life itself. What a time to be a scientist!