Pergamos - Library and Information Center of National and Kapodistrian University of Athens

Unit:

Κατεύθυνση Βιοπληροφορική-Επιστήμη Βιοϊατρικών Δεδομένων
Πληροφορική

Deposit date:

2023-05-05

Year:

2023

Author:

Paranou Dimitra

Supervisors info:

Dr. Zoe Cournia, Senior Researcher, Center for Translational Research, Biomedical Research Foundation of the Academy of Athens (BRFAA)
Dr. Theodore Dalamagas, Research Director, Information Management Systems Institute, ATHENA Research Center
Dr. Harris Papageorgiou, Research Director, Institute for Language and Speech Processing, ATHENA Research Center

Original Title:

Using deep learning and natural language processing to predict protein-membrane interactions of peripheral membrane proteins

Languages:

English

Translated title:

Using deep learning and natural language processing to predict protein-membrane interactions of peripheral membrane proteins

Summary:

Characterizing interactions with the membrane at the protein-membrane interface is crucial, as abnormal peripheral protein-membrane attachment is involved in the onset of many diseases. However, a limiting factor in studying and understanding protein-membrane interactions is that the membrane-binding domains of peripheral membrane proteins are typically unknown. By applying Artificial Intelligence (AI) techniques in the context of Natural Language Processing (NLP), the accuracy and prediction time for protein-membrane interface analysis can be significantly improved compared to existing methods. In this thesis, we describe a machine learning methodology for predicting membrane-penetrating amino acids using NLP and protein language models (pLMs). We utilize available experimental data from verified sources and generate protein features from two pLMs to train classifier models. Evaluation of the best neural network classifier model after optimization yields an F1score = 0.691 with MCC = 0.652 and F1score = 0.622 with MCC = 0.577 for the two different pLM features respectively. The generated MLP models provide highly promising results, yet with certain limitations that preclude generalization, namely the inability to make correct predictions for proteins outside the trained protein families. Overall, the results demonstrate the promising potential of using deep learning and pLMs to predict protein-membrane interactions faster and with similar accuracy compared to existing tools.

Main subject category:

Technology - Computer science

Keywords:

machine learning, deep learning, neural networks, natural language processing, protein language models, embeddings, attention maps, peripheral membrane proteins, protein-membrane interactions

Index:

Yes

Number of index pages:

Contains images:

Yes

Number of references:

111

Number of pages:

File:

https://pergamos.lib.uoa.gr/uoa/dl/object/3325240/file.pdf

Persistent URL:

https://pergamos.lib.uoa.gr/uoa/dl/object/3325240