The creation of a pangenome database of Proteus mirabilis and definition of the core genome as a tool for phylogenetic analysis

Postgraduate Thesis uoadl:2942734 177 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική-Υπολογιστική Βιολογία
Library of the School of Science
Deposit date:
2021-04-18
Year:
2021
Author:
Skoulakis Anargyros
Supervisors info:
Μπάγκος Παντελής, Καθηγητής, Τμήμα Πληροφορικής με Εφαρμογές στη Βιοϊατρική, Πανεπιστήμιο Θεσσαλίας
Original Title:
Δημιουργία βάσης δεδομένων πανγονιδιώματος (pangenome) Proteus mirabilis και ορισμός του core genome ως εργαλείου φυλογενετικής ανάλυσης
Languages:
Greek
Translated title:
The creation of a pangenome database of Proteus mirabilis and definition of the core genome as a tool for phylogenetic analysis
Summary:
Proteus mirabilis is a Gram negative bacterium that belongs in the family of Morganellaceae, and in humans is responsible mainly for urinary tract infections. The pangenome of P. mirabilis is the sum of all genes, that are observed in P. mirabilis strains and includes the core genome, corresponding to the common genes that are found in all the P. mirabilis strains, the dispensable genome, i.e. the set of genes that are found in more than one strain but not in all strains, and lastly the genes that are strain specific and are found in only one strain forming the unique genome. The growth and spread of antibiotic resistant strains of P. mirabilis and the need for their epidemiological surveillance requires the development of new, more sensitive phylogenetic techniques and tools. Thus, the purpose of this thesis is the characterization of the core genome and the pangenome of the bacterium P. mirabilis, in order to investigate the possible use of the core genome as a tool for phylogenetic studies.
To find the core genome and the pangenome, among the assembled genomes stored in Genbank database, only the most reliable assemblies were used for analysis. Then using the tool Prokka the assemblies were annotated and using the CD-HIT tool their proteins were grouped into groups of orthologous proteins (clusters) based on similarity and coverage rates. Clusters, in which all the genomes used in the analysis are found, are considered the clusters of core genome, and the core genome consists of their representative genes. In each cluster of the core genome, the different genes found in this cluster were aligned. Using the aligned data, the pseudogenome of every assembly was created; the pseudogenome of an assembly consists of the aligned sequences of all the genes of this assembly contained in the clusters of the core genome, joined in series. Comparing the different pseudogenomes, the phylogenetic tree of P. mirabilis was created using the RAxML tool. Also, the core genome Multilocus Locus Sequence Types (cgMLST types) of the different strains of P. mirabilis were found, and compared with each other.
The results of the present work are the characterization of the core genome and the pangenome of the bacterium P. mirabilis, the classification of the various strains into cgMLST types and the creation of the phylogenetic tree of all P. mirabilis genomes stored in Genbank database. Also, from our analysis, it appears that phylogenetic analyses using the core genome are reliable and highly accurate, and can be used for epidemiological surveillance of various epidemic strains. Finally, in the context of this thesis, automatic tools for pangenomic analyses were created.
Main subject category:
Science
Other subject categories:
Health Sciences
Keywords:
Pangenome, core genome, Phylogenetic analysis, Phylogenetic tree, Proteus mirabilis
Index:
No
Number of index pages:
0
Contains images:
Yes
Number of references:
85
Number of pages:
77
Διπλωματική_ΑΣκουλάκης(1).pdf (3 MB) Open in new window