Unit:
Κατεύθυνση ΒιοπληροφορικήLibrary of the School of Science
Author:
Markopoulou Kalliopi
Supervisors info:
Καθηγητής Κωνσταντίνος Βοργιάς (Επιβλέπων) Τομέα Βιοχημείας & Μοριακής Βιολογίας, Τμήμα Βιολογίας, Πανεπιστήμιο Αθηνών
Αναπληρωτής Καθηγητής Ιωάννης Τρουγκάκος Τομέα Βιολογίας Κυττάρου & Βιοφυσικής, Τμήμα Βιολογίας, Πανεπιστήμιο Αθηνών
Ερευνητής B Χατζηιωάννου Αριστοτέλης Μεταβολική Μηχανική και Βιοπληροφορική, Ινστιτούτο Βιολογίας, Φαρμακευτικής Χημείας & Βιοτεχνολογίας, Εθνικό Ίδρυμα Ερευνών
Original Title:
Βιοπληροφορικές προσεγγίσεις περί της αποσαφήνισης της καρκινογένεσης με την διαμεσολάβηση των long non-coding RNAs
Translated title:
Long-non coding RNA – mediated epigenomic regulation in carcinogenesis
Summary:
The emerging revolution in genomics and the explosion in high-quality data generation have opened an exciting new research and therapeutic avenues for cancer research. At the same time, the massive data volumes produced pose new challenges regarding data management and, most importantly, rational data mining. The newly-generated vast data streams can help identify novel, personalized gene signatures, and groups of patients that share similar molecular characteristics. The use of these signatures allows researchers to make more informed decisions on prognostic, predictive, and personalized treatments of cancer.
The proper visualization of ultrahigh-dimensional genomic data is an essential tool for cancer genomic analysis, as it allows advanced data exploration, the generation of hypotheses, and provides input regarding informative system variables for subsequent dimensionality reduction. In the present work, we present the application of a new approach for visualizing data with the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm. t-SNE, a technique used for dimensionality reduction, clusters and visualizes high-dimensional data by giving each data point a location in a two or three-dimensional map by reducing the tendency to crowd points together in the center of the map. We applied t-SNE in an integrated gene expression dataset from 3102 samples across five cancer types, derived from The Cancer Genome Atlas (TCGA) consortium. We performed several computational experiments to fine-tune t-SNE with different gene sets (protein-coding genes, lincRNAs). We find that applying t-SNE to lincRNAs improves the overall clustering among the five tested cancer datasets, highlighting the potential value of these molecules as potential prognostic, diagnostic and therapeutic targets.
Main subject category:
Science
Keywords:
Next generation sequencing, gene expression, cancer
File:
File access is restricted only to the intranet of UoA.
Μαρκοπουλου_Καλλιοπη_Διπλωματικη_final.pdf
5 MB
File access is restricted only to the intranet of UoA.