Using deep learning and natural language processing to predict protein-membrane interactions of peripheral membrane proteins

Postgraduate Thesis uoadl:3325240 94 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική-Επιστήμη Βιοϊατρικών Δεδομένων
Πληροφορική
Deposit date:
2023-05-05
Year:
2023
Author:
Paranou Dimitra
Supervisors info:
Dr. Zoe Cournia, Senior Researcher, Center for Translational Research, Biomedical Research Foundation of the Academy of Athens (BRFAA)
Dr. Theodore Dalamagas, Research Director, Information Management Systems Institute, ATHENA Research Center
Dr. Harris Papageorgiou, Research Director, Institute for Language and Speech Processing, ATHENA Research Center
Original Title:
Using deep learning and natural language processing to predict protein-membrane interactions of peripheral membrane proteins
Languages:
English
Translated title:
Using deep learning and natural language processing to predict protein-membrane interactions of peripheral membrane proteins
Summary:
Characterizing interactions with the membrane at the protein-membrane interface is crucial, as abnormal peripheral protein-membrane attachment is involved in the onset of many diseases. However, a limiting factor in studying and understanding protein-membrane interactions is that the membrane-binding domains of peripheral membrane proteins are typically unknown. By applying Artificial Intelligence (AI) techniques in the context of Natural Language Processing (NLP), the accuracy and prediction time for protein-membrane interface analysis can be significantly improved compared to existing methods. In this thesis, we describe a machine learning methodology for predicting membrane-penetrating amino acids using NLP and protein language models (pLMs). We utilize available experimental data from verified sources and generate protein features from two pLMs to train classifier models. Evaluation of the best neural network classifier model after optimization yields an F1score = 0.691 with MCC = 0.652 and F1score = 0.622 with MCC = 0.577 for the two different pLM features respectively. The generated MLP models provide highly promising results, yet with certain limitations that preclude generalization, namely the inability to make correct predictions for proteins outside the trained protein families. Overall, the results demonstrate the promising potential of using deep learning and pLMs to predict protein-membrane interactions faster and with similar accuracy compared to existing tools.
Main subject category:
Technology - Computer science
Keywords:
machine learning, deep learning, neural networks, natural language processing, protein language models, embeddings, attention maps, peripheral membrane proteins, protein-membrane interactions
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
111
Number of pages:
75
Master_Thesis_Dimitra_Paranou_up.pdf (6 MB) Open in new window