Kernel Support Vector Machine learning of imbalanced classes with application to reproductive medicine on the UK population

Postgraduate Thesis uoadl:3232068 69 Read counter

Unit:
Κατεύθυνση Βιοστατιστική
Library of the School of Health Sciences
Deposit date:
2022-09-16
Year:
2022
Author:
Dimitriou Evangelos
Supervisors info:
Απόστολος Μπουρνέτας, Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Φώτιος Σιάννης, Επίκουρος Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Ορέστης Τσώνης, MD MSc PhD, Assisted Conception Unit, Guy's ad St Thomas' NHS Foundation Trust
Original Title:
Kernel Support Vector Machine learning of imbalanced classes with application to reproductive medicine on the UK population
Languages:
English
Translated title:
Kernel Support Vector Machine learning of imbalanced classes with application to reproductive medicine on the UK population
Summary:
The subject of this project is the category of Machine Learning algorithms called Kernel Support Vector Machines (SVM) for the classification of data that belong to two classes. The method is applied to data that come from the field of Reproductive Medicine with the objective of predicting the Live Birth Occurrence of patients that undergo IVF treatment.
Firstly, the Kernel SVM algorithms and the problems they face when the two classes are imbalanced are investigated. Next, various solutions of the literature to face the class imbalance problem are discussed, underlining the advantages and disadvantages of these methods. A new under-sampling method, called Cosine similarity Under-Sampling (CUS), based on the cosine similarity of the data, which is a similarity metric, is also proposed.
Secondly, those methods are applied to data that come from the field of Reproductive Medicine in order to predict the Live Birth Occurrence of UK patients that undergo IVF treatment. Specifically, the data are collected by the Human Fertilization and Embryology Authority (HFEA) and are related to IVF cycles that take place every year in the UK. The data are collected from 1991 until today and provided that they are anonymized, they are available for research purposes. Other than the machine learning algorithms, classic classification methods, such as logistic regression, are also applied.
The methods mentioned above are also applied to simulated data, continuous and mixed-type (continuous and discrete) in order to verify their performance.
The different algorithms are compared with the use of appropriate evaluation metrics, leading to the conclusion that the novel method proposed (CUS) is better than other approaches when classifying mixed-type data and the IVF data, while it comes second when classifying the continuous simulated data.
Main subject category:
Health Sciences
Keywords:
Support vector machine, Imbalanced classes, Supervised learning, Classification, Reproductive medicine
Index:
No
Number of index pages:
0
Contains images:
Yes
Number of references:
103
Number of pages:
125
File:
File access is restricted only to the intranet of UoA.

dissertation_evangelos_dimitriou.pdf
2 MB
File access is restricted only to the intranet of UoA.