Rapid and accurate identification of organisms at strain level in complex Microbiomic, Metagenomic, and Metatranscriptomic NGS samples with quasi-mapping.

Postgraduate Thesis uoadl:1506220 853 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική
Πληροφορική
Deposit date:
2017-05-04
Year:
2017
Author:
Skoufos Georgios
Supervisors info:
Καθ. Άρτεμις Χατζηγεωργίου, Ερευνήτρια, Τμήμα Μηχανικών Η/Υ, Τηλεπικοινωνιών και Δικτύων του Πανεπιστημίου Θεσσαλίας, Επικεφαλής του DIANA-Lab.
Δρ. Ιωάννης Βλάχος, Επιστημονικός Συνεργάτης, Broad Institute του MIT και Harvard, Brigham and Women’s
Hospital.
Δρ. Μαρία Παρασκευοπούλου, Ερευνήτρια/Επιστημονικός Συνεργάτης, DIANA-Lab, Τμήμα Μηχανικών Η/Υ,
Τηλεπικοινωνιών και Δικτύων του Πανεπιστημίου Θεσσαλίας.
Original Title:
Ανάλυση δεδομένων μεταγονιδιωματικής (metagenomics) και μικροβιώματος (microbiome) από πειράματα NGS με την τεχνική του quasi-mapping
Languages:
Greek
Translated title:
Rapid and accurate identification of organisms at strain level in complex Microbiomic, Metagenomic, and Metatranscriptomic NGS samples with quasi-mapping.
Summary:
The development of high-throughput sequencing technologies has transformed our capacity to investigate the composition and dynamics of the microbial communities that populate terrestrial and aquatic ecosystems as well as the human skin, gut and oral cavity. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial and viral communities, and hence tend to result in vast file sizes.
The purpose of the present study was the design and implementation of a computational tool - pipeline which has the ability to identify and quantify organisms at strain level in complex Microbiomic, Metagenomic, and Metatranscriptomic Next Generation Sequencing (NGS) samples. The pipeline has the ability to be used as a metagenome classifier in 16S rRNA Sequencing and Shotgun Metagenomic Sequencing datasets as well as the ability to analyze mixed tissue-specific DNA/RNA NGS samples consisting of the host (Human, Mouse, other mammalian species) and its microbiome.
The main functions of the implemented pipeline are the identification and quantification of the microbiome in NGS samples, the abundance estimation in Family, Genus, Species and Subspecies taxonomic ranks and the filtering of the estimated results based on user-specific criteria.
This study presents the results obtained by applying the pipeline to analyze both microbiome and mixed host-microbiome simulated NGS datasets as well as real tissue-specific Mus Musculus RNA datasets obtained from NCBI’s GEO. The comparison between the implemented pipeline and state-of-the-art metagenomic classification tools, showed, that in every case, the pipeline produces more accurate results in terms of abundance estimation and in many cases, is faster too.
metaHost is a rapid and accurate pipeline that identifies and quantifies microbiome organisms at strain level in complex Metagenomic – Metatranscriptomic NGS samples based on a fully automated workflow which is easily adaptable to the needs of its users.
Main subject category:
Technology - Computer science
Keywords:
metagenomics, microbiome, metatranscriptomics, Next Generation Sequencing, reads mapping, NGS, microbiome identification, quantification, quasi-mapping
Index:
Yes
Number of index pages:
6
Contains images:
Yes
Number of references:
48
Number of pages:
85
Skoufos Giorgos - Diplomatiki ergasia.pdf (5 MB) Open in new window