AI tools shed light on millions of proteins

AI tools shed light on millions of proteins

AI tools shed light on millions of proteins

Screenshot from the interactive web “Atlas of the Protein World”. Credit University of Basel, Biozentrum

Editor’s note: Life on Earth uses a large number of proteins. But there are many more proteins – millions – that could exist – far more than terrestrial biology is used to. What proteins will be used in the metabolism of life forms on other worlds? Will they be very similar to us, a little alike, or completely “weird”?


A research team from the University of Basel and SIB’s Swiss Institute for Bioinformatics has discovered a treasure trove of uncharacterized proteins. Through the recent deep learning revolution, they have discovered hundreds of new protein families and even a predicted new protein fold. The study has now been published in the journal Nature.

In recent years, AlphaFold has revolutionized protein science. This artificial intelligence (AI) tool is trained on protein data collected by life scientists for over 50 years, and is able to predict the 3D shape of proteins with high accuracy. Its success has led to the modeling of an astonishing 215 million proteins in the past year, providing insight into the shapes of almost any protein. This is particularly interesting for proteins that have not been studied experimentally, which is a complex and time-consuming process.

“There are now many sources of protein information, which contain valuable insights into how proteins evolve and function,” says study leader Joanna Pereira. However, research has long faced a forest of data. The research team led by Professor Torsten Schwede, group leader at the Biozentrum at the University of Basel and the Swiss Institute for Bioinformatics (SIB), has now succeeded in decrypting some of the hidden information.

A,Starting from the collections in UniRef50, we collected all,functional annotations for all included UniProtKB and UniParc,entries, including domain (D), coiled coil (CC), and,intrinsically perturbed predictions (IDPs) and excluded all those with,default and non-assumptions. Distinctive and DUF in their names. Cx corresponds to the annotation coverage, and Ci corresponds to the functional brightness across the entire sequence. We chose the protein with the highest full-length annotation coverage (i.e., brightness, Ci) as the functional representative of each group. B, From the collected UniRef50 clusters, we selected those with a structural representative with pLDDT greater than 90 in AFDB v.4, and built a large-scale sequence similarity network through comprehensive MMseqs2 searches, representing the sequence landscape of more than 6 million clusters. UniRef50. – nature

A bird’s-eye view reveals new protein families and folds

The researchers built an interacting network of 53 million proteins with high-quality AlphaFold structures. “This network serves as a valuable resource for theoretically predicting unknown protein families and their functions on a large scale,” emphasizes Dr. Janani Durairaj, first author. The team was able to identify 290 new protein families and a new flower-shaped protein fold.

Building on the Schwede Group’s experience in developing and maintaining the pioneering software SWISS-MODEL, they have made the network available as an interactive web resource, called the “Protein Universe Atlas.”

Artificial intelligence as a valuable tool in research

The team used deep learning-based tools to find novelties in this network, paving the way for innovations in the life sciences, from basic to applied research. “Understanding the structure and function of proteins is usually one of the first steps in developing a new drug, or modifying their functions by protein engineering, for example,” says Pereira. This work was supported by a Kickstarter grant from Sharjah Islamic Bank to encourage the adoption of AI in life sciences resources. It emphasizes the transformative potential of deep learning and intelligent algorithms in research.

Using the Protein World Atlas, scientists can now learn more about proteins relevant to their research. “We hope that this resource will help not only researchers and biological curators, but also students and teachers by providing a new platform for learning about protein diversity, from structure, to function, to evolution,” says Janani Durraj.

Atlas of the Proteome Universe:

Uncovering new families and folds in the natural protein world, Nature (Open Access)


You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *