Unit:
Κατεύθυνση Βιοπληροφορική-Υπολογιστική ΒιολογίαLibrary of the School of Science
Author:
Vergoulidis Nikolaos
Supervisors info:
Γεώργιος Παυλόπουλος, Ερευνητής Ά, Ε.Κ.Ε.Β.Ε "Αλέξανδρος Φλέμινγκ"
Ζωή Λίτου, μέλος Ε.ΔΙ,Π., Τμήμα Βιολογίας ΕΚΠΑ,
ΝΙκόλαος Παπανδρέου, μέλος Ε.ΔΙ.Π., Τμήμα Βιολογίας ΕΚΠΑ
Original Title:
«Ανακάλυψη ενζύμων βιοτεχνολογικού ενδιαφέροντος μέσω μεταγονιδιωματικής ανάλυσης αλληλουχιών»
Translated title:
«Enzyme discovery through metagenomic sequence analysis»
Summary:
A metagenome is the total amount of genetic material in an environmental sample. Metagenomic analysis is of paramount importance for identifying and understanding the complex mechanisms involved in sequence, structure and functions of de novo proteins as well as the recruitment of new enzymes for use in biotechnology. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various biomes.
The goal of this project is utilizing bioinformatics methods for enzyme discovery and engineering of protein sequences from metagenomic datasets. The project will focus on bacterial enzymes with biomedical and biotechnological applications and we will employ state-of-the-art search and clustering methods to recruit and cluster enzyme sequences from up-to-date curated data repositories such as IMG/M, MGnify and UniParc. The resulting clusters will be used to generate protein 3D models of the enzymes, through the use of Artificial Intelligence (A.I.) and the derived models will be used as the basis for studying the mechanisms of enzyme activity and designing novel drugs and inhibitors.
By the end of this project, UniParc database will have been analyzed (version by the time of initialization of the project included ~544 million entries) for the detection of protein domains of four superfamilies such as: Nattokinase, Feruloyl Esterases, Cocaine Esterases, Petases Pet Hydrolases. Protein profiles data will be derived from the database members of InterPro such as: Pfam, PRINTS, PANTHER, PROSITE patterns, Tigrfams, SUPERFAMILY. Distinct bioinformatic methods will be used for the analysis and scripts in a variety of programming languages will be used for handling the big-data. The overall workflow implementing the distinct tools used for the analysis will be developed in a global multi-layered pipeline using nextflow programming language and will be available in public repositories such as GitHub. The hits of the searching procedure, including the metadata and generated 3d models will be deposited in a web application database (Meta-4) with Graphical User Interface to be open and easily accessed for research/scientific purposes
Main subject category:
Science
Keywords:
Metagenomics, Biodiversity, Enzyme discovery, Metagenomic Sequence Analysis, Sequence Searching and Clustering, Protein Domains