Supervisors info:
Επιβλέπων διπλωματικής εργασίας:
Βασίλης Κατσούρος, Διευθυντής Ερευνών Ινστιτούτου Επεξεργασίας του Λόγου, Ερευνητικό Κέντρο ”Αθηνά”
Email Επιβλέποντος: vsk@athenarc.gr
Παρακαλώ χρησιμοποιήστε αυτό το email (vsk@athenarc.gr) και όχι το vkatsouros@di.uoa.gr
Μέλη εξεταστικής επιτροπής διπλωματικής εργασίας:
Βασίλης Παπαβασιλείου, Επιστημονικός συνεργάτης Ινστιτούτου Επεξεργασίας του Λόγου, Ερευνητικό Κέντρο ”Αθηνά”
Κυροδήμος Ευθύμιος, Αναπληρωτής Καθηγητής, Ιατρική Σχολή, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Summary:
Objective: The purpose of the project is to propose a machine learning based classification model, able to identify patients in high risk for decreased overall survival based only on CT-derived muscle related data, in patients with stage IV HNSCCs. As part of the project, an automated paravertebral muscle area (with and without intermuscular and intramuscular adipose tissue) segmentation method will be developed and proposed. Our aim will not be to achieve near perfect classification results (something utopic due to the complex medical background of the problem addressed), but to identify possibly high-risk group of patients that may be benefited from targeted nutritional and other interventions. Therefore, we are aiming to develop an automated screening method that will be based on CT-derived muscle related data. Material and Methods: A PET-CT collection, with 298 patients with histologically proven head-and-neck cancer, was retrieved from the cancer imaging archive and was used for the purposes of this pilot study. We included only patients with Stage IV cancer, with known site of the primary tumour and with a minimum follow-up period of 5 years. These inclusion criteria resulted in 74 patients. Further sub-cohorts (with 47 and 51 patients) were created with the application of extra exclusion criteria in the group of patients with oropharyngeal carcinomas. Premature death was defined as death when the survival probability was higher than 75% in the separate, for each primary site, survival curves. Unsupervised machine learning methods were also used to address the separability of our data and to test different feature selection strategies. Classification results after training on both manually and automatically segmented muscle areas were evaluated. Best performing classifiers were tested on a validation set consisted of the three images per patient that had not been used for training. Validation results were tested in terms of classifiers’ ability to separate survival curves of the low-risk and the high-risk group of patients statistically significantly. Survival analysis was performed using Kaplan-Meier survival curves. Results: In unsupervised learning we observed that when excluding patients with OPSCC without premature death, there seemed to be an inherent 3-cluster tendency in our dataset (one cluster with overrepresentation of low-risk patients and two clusters with overrepresentation of high-risk patients). Our classification results were very encouraging, as we managed to train classifiers that served well the screening purposes of the problem addressed, by achieving high recall while maintaining an acceptable F1-score. The best results in the validation set were obtained in the cohort with 47 patients and when classification models were trained with 7 principal components and with a test ratio of 0.3. A soft voting ensemble model achieved to showcase a trend for difference in survival curves between the two risk groups (p-value < 0.1) in 80% of the 40 different train-test splits of the dataset, and to separate statistically significantly the two curves in 65% of the splits. Conclusion: The proposed automatic method for segmentation, radiomic feature extraction and subsequent patient risk stratification, based on CT-derived skeletal muscle related data, constitutes a promising automatic screening method. The fact that results were evaluated on 40 different train-test splits of the dataset and that proposed risk stratification was tested on a validation set using the same risk cut-off points and not always the optimal ones, along with the consistency regarding various classifiers’ performance pave the way for potential generalization. However, more data are needed to establish risk stratification based on CT-derived skeletal muscle related data as a clinically useful biomarker.
Keywords:
Radiomics, head and neck cancer, automatic segmentation, risk stratification, machine learning