Supervisors info:
Γεωργακίλλας Αλέξανδρος, Αναπληρωτής Καθηγητής, ΣΕΜΦΕ, ΕΜΠ
Τρουγκάκος Ιωάννης, Επίκουρος Καθηγητής , Βιολογικό, ΕΚΠΑ
Χατζηιωάννου Αριστοτέλης, Ερευνητής Γ', ΙΒΦΧΒ, ΕΙΕ
Summary:
This is study on how Machine Learning (ML) techniques can be used to predict the outcome of ionizing
radiation on cells. The problem consists of predicting the quantities of Relative Biological Effectiveness
(RBE) and the α and β coefficients of the quadratic model. The quantities to be predicted are continuous,
so in essence it is a multi-variable regression problem. The RBE quantity in our case, signifies, what effect the
radiation has on cell survival. The α and β coefficients, represent the linear and quadratic contributions to
cell death respectively. There are 3 different ML algorithms used, using 2 separate datasets. The algorithms
used were, Gradient Boosting Decision Trees (GBDT), the Random Forest Regression (RF) and Support
Vector Regression (SVR). Two implementations of the GDBT were used. As a further trial, a voting
regression (VR) is implemented across all previous algorithms, that predicts the dependent variables based
on a consensus method. The algorithms employed were of radically different design and approach. The aim
was to combine different techniques and show that the combined model fairs better in predictive performance
and generalization. The results show that VR fairs quite better in generalizing and predicting. Our datasets
consists of mainly HZE irradiation features, i.e they contain cell specific data like cell line, cell phase and
tumorous state, as well as radiation features like LET , specific energy and ion species. The Datasets are
quite small, compiled by different methods and generally cannot been seen as a black box. It is shown that
the datasets are somewhat noisy and contain multi-collinearities The whole work is meant to be a showcase
of the usefulness of ensemble and consensus techniques in predicting the aforementioned quantities. A bigger,
more cohesive and consistent dataset is required, in order to predict, prior to a radiotherapy treatment, the
outcome in a more robust and less arbitrary way. A more in-depth feature analysis is required, to assess
which features are essential to the predictions. To make our estimators intended for a more general use in
the scope of radiobiology, further integration with data from photon irradiation data are needed.