«Utilization of Machine Learning methods in data analysis related to Cancer»

Postgraduate Thesis uoadl:2885984 306 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική
Library of the School of Science
Deposit date:
2019-11-20
Year:
2019
Author:
Spyrou Nikolaos
Supervisors info:
Hρακλής Βαρλάμης, Αναπληρωτής Καθηγητής, Τμήμα Πληροφορικής και Τηλεματικής, Χαροκόπειο Πανεπιστήμιο
Original Title:
«Aξιοποίηση μηχανικής μάθησης για τη μελέτη ιατροβιολογικών δεδομένων σχετικών με τον καρκίνο»
Languages:
Greek
Translated title:
«Utilization of Machine Learning methods in data analysis related to Cancer»
Summary:
Living in an era when scientific information is voluminously produced, the need for its critical evaluation is all the more pronounced. For this reason, the construction of tools that provide reliable answers, to biomedical questions, in an automated manner, is one of the greatest modern challenges of Biomedical Informatics.
The main target of this work is to analyze and classify questions of medical-clinical content, with emphasis put on the field of Medical Oncology. The questions used as initial data in this Thesis, were processed with various methods of Natural Language Processing and Machine Learning, with the overarching goal of matching them to the best source that may facilitate to their answer, based on rules of Evidence Based Medicine.
The dataset of the current Thesis was drawn from the question database of the BioASQ Challenge. The BioASQ Challenge is an international biomedical text analysis competition. The questions included in our dataset were further analyzed and classified by two biomedical experts.
The sequence of analysis of the dataset was divided in three stages, each focusing in different ontological aspects of the questions. First, we attempted to automatically classify the set in background questions i.e. questions that have a clear, well defined answer that can be sought into textbooks, and foreground questions i.e. questions that concern specific cases of patients, therapies or diseases and their answer is a subject of controversy, being in the edge of knowledge and should be sought in research repositories (eg Pubmed). The second stage, regarded the automated identification of PICO elements (patient, intervention, comparison, outcome) in the subset of foreground questions. The last step considered the classification of foreground questions to ones of therapy, diagnosis, prognosis and outcome. The main aim of these stages was to identify elements that would point to certain directions as per what type of clinical research evidence would be the best source for answering each question.
In this context, we managed to successfully classify the dataset of clinical questions, with a combination of Machine Learning methods. Specifically, in each of the three stages we constructed models that classified questions not only better than a random classifier, but also, in many cases, approached the performance of ideal classification. Our ambition, stemming from this Thesis, is the construction of an automated clinical question answering system that sets the respect of the Evidence Based Medicine rules as its top priority.
Main subject category:
Technology - Computer science
Keywords:
bioinformatics, natural language processing, clinical questions, bioasq challenge, bioasq, text mining, eveidence based medicine, medicine, biomedical informatics
Index:
Yes
Number of index pages:
3
Contains images:
No
Number of references:
50
Number of pages:
127
Thesis_Spyrou_Bioinformatics.pdf (1 MB) Open in new window