Summary:
The purpose of the following thesis is the implementation, pre-training, fine-tuning, and evaluation of a DistilBERT model specialized in the Modern Greek Language. After an in-depth examination of the theoretical background in classical Machine Learning and Deep Learning with Neural Networks with the purpose of understanding the inner workings of Transformer Neural Networks and especially BERT and DistilBERT, what follows is the detailed description of the development process concerning such a model, from its pretraining on large Modern Greek Language Corpora, to its fine-tuning and evaluation on downstream Natural Language Processing Tasks, such as Named Entity Recognition, Part of Speech Tagging, as well as Natural Language Inference. The Knowledge Distillation technique seems to play a paramount role in the creation of models that appear to be faster and relatively computationally inexpensive when compared to other larger similar architectures, yet they seem to exhibit rather insignificant downgrades, if any, when it comes to accuracy. The model developed for the purposes of this thesis (DistilBERT-EL-KLD), being a compressed version of GREEK-BERT, appears to exhibit very similar performance and output results to those of its predecessor.
Keywords:
compression, knowledge distillation, neural networks, deep learning, classification