Dissertation committee:
- Vangelis Karalis (Academic Supervisor)
Associate Professor, Department of Pharmacy, National and Kapodistrian University of Athens, Greece
- Georgia Karali (Member of the Advisory committee)
Associate Professor, Department of Mathematics, National and Kapodistrian University of Athens, Greece
- Evangelos Terpos (Member of the Advisory committee)
Professor, School of Medicine, National and Kapodistrian University of Athens, Greece
- Aleksandra Catic-Djordjević
Associate Professor, Department of Pharmacy, University of Nis, Serbia
- Ioannis Dotsikas
Associate Professor, Department of Pharmacy, National and Kapodistrian University of Athens, Greece
- Sophia Markantonis-Kyroudi
Emeritus Professor, Department of Pharmacy, National and Kapodistrian University of Athens, Greece
- Anastasia Pippa
Assistant Professor, Department of Pharmacy, National and Kapodistrian University of Athens, Greece
Summary:
Introduction
This dissertation aimed to utilize machine learning and neural networks, the forefront models in deep learning, to enhance the processes and outcomes of clinical trials as well as bioequivalence studies. The goal was to use these advanced computational techniques to gain deeper insights and improve the accuracy and efficiency of the analyses in these important areas of medical research.
To achieve that, the first two research papers focused on an analysis of the kinetics of neutralizing antibodies (NAbs) against SARS-CoV-2 using a kinetic model and identifying their predictive factors by utilizing four machine learning algorithms. In the subsequent three research papers, a novel data augmentation framework was developed, which utilizes generative neural networks and Monte Carlo simulations. The framework was applied in the context of clinical trials bioequivalence testing of low, mid and highly variable drugs. The framework aims to reduce the required sample size for this type of studies which brings numerous breakthrough benefits.
Methods
To investigate the kinetics of NAbs, a kinetic model was used to describe their elimination. The optimal model included a single compartment (the whole body) and linear elimination kinetics. To identify the individuals’ characteristics that could predict the NAbs levels, four machine learning techniques were applied. Namely principal component analysis and factor analysis of mixed data, K-means clustering and random forest. The first two methods revealed the interactions between different features and how these affect the Nabs levels whereas the last two allowed us to group the individuals into distinct groups and quantify the predictive factors of NAbs respectively.
Concerning the use of generative algorithms in clinical research, a framework was developed that combined Monte Carlo simulations with a generative neural network, namely variational autoencoders. The Monte Carlo simulations were utilized to replicate the exact conditions of the clinical trial and the BE study, whereas the VAE was applied to a subsample of the original dataset, to generate new, synthetic data, based on the real ones. Various scenarios were tested and different hyperparameters of the VAE model were explored to achieve the optimal model.
Results
The kinetic model identified three distinct kinetic phases on the time elapsed since vaccination and that the NAbs disappear relatively slow at first, but that their removal becomes around six times greater from the third to the sixth month, indicating that they are eliminated much more quickly. K-means identified five unique groups of individuals, each one driven by unique characteristics, whereas using two principal components, were able to explain 63.4% of the variability and identify the positive relation between the NAbs levels at 3 months (M3) and 9 months (M9) after vaccination. This was validated by the random forest, which indicated that the NAbs levels after 3 months is the most important feature when predicting the NAbs levels after 9 months.
To reduce the required sample size in clinical studies, an innovative methodology was used and by utilizing various forms of VAEs, we were able to create virtual samples very similar to the real ones. These synthetic data performed at least as well as the real data, even when only 30-40% of the real data was used. It is worth noting that in scenarios with high variability, the data generated by the VAE showed higher statistical power, effectively reducing “noise”, and improving the reliability and robustness of the results. The findings in bioequivalence studies were very desired as well. By applying the same methodology, investigating different parameters for VAEs, and testing multiple scenarios, including different levels of variability, original sample sizes, sample sizes generated by the VAE, and average performance differences between the pharmaceutical products being compared, it was demonstrated that using generative algorithms, and more specifically VAEs, we can achieve the same and in many cases better results, with a significantly smaller sample than the original, thus significantly reducing the cost and time required to complete the studies. Particularly in the case of high variability drugs, the synthetic data performed similarly to the real data, using a smaller sample, even without scaling the confidence interval limits.
Discussion
Overall, the modeling of NAb kinetics showed that there are three distinct kinetic phases based on the time since vaccination. Initially, NAbs decline relatively slowly, but their clearance rate increases approximately sixfold from the third to the sixth month. Principal components analysis showed the strong relationship between M3 and M9 and factor analysis of mixed data revealed that obesity and age have a negative effect to the NAbs levels whereas gender did not have any effect in any of the five distinct groups that were identified by K-means. Random forest indicates that the NAbs levels at different time periods after the vaccination, are more important than age, gender and BMI, when predicting the NAbs levels after 9 months of the vaccination.
The optimized VAEs demonstrated superior performance than the subsampled and similar and many times better than the original datasets, indicating that similar BE testing results, can be achieved by using less samples.
The introduction of generative neural networks in clinical studies to reduce sample sizes, combined with Monte Carlo simulations and VAE, demonstrated that VAEs can serve as a valuable tool in clinical trials and bioequivalence studies. Using VAEs, statistical power can be increased while the required size of a clinical study can be reduced by up to 30%, thereby lowering the necessary sample size, reducing costs and time, and addressing ethical issues related to human participation.
Conclusions
Overall, this dissertation demonstrated that machine learning enables the identification of complex patterns and trends that are difficult to detect by other means in clinical studies. Using machine learning methods, interactions were identified between an individual's characteristics and NAb levels nine months after vaccination, and their impact was quantified. Most notably, this dissertation proposes, for the first time, the use of VAEs to augment data in clinical and bioequivalence studies and to reduce the required sample size. It showed that applying VAEs in clinical and bioequivalence studies represents a modern and useful tool that can significantly reduce the need for large sample sizes, lower costs, and shorten the completion times of clinical trials, while maintaining or even enhancing the quality and reliability of results.