Κατεύθυνση ΒιοπληροφορικήLibrary of the School of Science
Κωνσταντίνος Ε. Βοργιάς, Καθηγητής, Τομέας Βιοχημείας και Μοριακής Βιολογίας,
Τμήμα Βιολογίας, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Αξιοποίηση ομικών δεδομένων με σκοπό τη μελέτη και την ερμηνεία διαφορετικών αναπαραστάσεων ολοκληρωμένων δικτύων (Ιntegrated Networks)
Studying and intepreting different representations of multi-omics integrated networks
One of the greatest modern challenges in the fields of Systemic Biology and
Bioinformatics is the development of personalized computational approaches
in complex diseases that are able to extract, combine and represent multi-omics
and clinical data from samples of patients, in an optimal way.
The ultimate goal of this work was to highlight the potentially improved
performance of a prediction model that employ higher complexity representations
such as the topology vector of a multi-omics integrated network, compared to
lower complexity representations such as a feature vector.
Specifically, in the present study, we performed data mining from the
Genomic Data Commons database of 412 patients that suffer from muscle-invasive
bladder urothelial carcinoma. The final dataset comprises of gene
and miRNA (microRNA) expression data as well as DNA methylation data,
extracted from both primary tumor and normal tissue samples.
In this context, we succeeded in constructing an optimized algorithm for
mining and processing the desired biological information from a total of
1250 files and 435 samples, as well as constructing multiple representations
of the data in the form of a feature vector, network and topology vector.
In addition, the reconstruction of the integrated network for the bladder
urothelial carcinoma was carried out, selecting only the most differentiated
features of all patients. We then designed and built a personalized network
representation of each patient. Exploratory analyses and in silico experiments
dealing with classification problems have been performed in order to investigate
the performance of each representation. In conclusion, the feature vector representation
provides optimal results for the sample class classification problem, using
both the k-nearest neighbor and the decision tree algorithms. Topological
representation provides better performance than the feature vector in the tumor
stage classification task using the k-nearest neighbor algorithm.
Main subject category:
Bioinformatics, Personalized Medicine, Machine Learning, Systems Biology, Representations. Data mining, omics
File access is restricted until 2022-02-01.