Supervisors info:
Σάμης Τρέβεζας, Επίκουρος Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Summary:
This thesis aims to address a problem from the agricultural sciences, namely, predicting corn’s phe-
nological stage percentages using large-scale data. Current state-of-the-art research on the problem
includes Hidden-Markov Models and Generalized Linear Mixed-Effects Models. In this thesis, the
problem is viewed from a machine learning perspective. In particular, we investigate how the Ran-
dom Forest (RF) algorithm as well as some of its variants can be implemented in our case.
We first introduce the problem and contextualize it within the machine learning framework. We
then study the induction of decision trees, the building block of RF, covering both the univariate and
the multivariate case. Furthermore, we describe the Random Forest algorithm and present different
sampling, splitting, and aggregation options, including subject-level bootstrapping, Extremely Ran-
domized Trees, and Historical Random Forests. Finally, we compare their results with each other
upon the specific task of predicting the phenological stage percentages of corn crops in the USA and,
more specifically, in the state of Nebraska.
Keywords:
Decision Trees, Random Forests, Precision agriculture, Corn phenological stages prediction, Repeated measures data