Μανόλης Κουμπαράκης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
With the evolution of Artificial Intelligence (AI), every aspect in everyday life keeps changing. Multiple industries like education, transportation, health, gaming, social life, space industry and much more keep adjusting their way of working using AI techniques that provide automated procedures, less human errors and more efficiency and accuracy to delivering good products and services. The data plays a very important role to help the machine learn from the past, and to understand better the present and the future.
The role of deep learning shows up in the need to make good use of text documents, journals, papers, and everything textual that can be found in the web (or not) and that is difficult to extract knowledge from it. Neural networks are applied via deep learning, and aim to recognize patterns by interpreting data through machine perception, data pre-processing, clustering and classification.
Named Entity Recognition (NER) is one of the main tasks required for information extraction which targets on recognizing entities from unstructured text, and translating them into pre-defined subcategories, such as Person, Location, Organization, and more.
The research that has been done around this domain is large enough to question and analyse what is the best technique or combination of techniques applied to extract the best result on NER. The goal of this thesis, is to investigate the existing progress in NLP, aiming at NER. Following publicly available documents and applications, we are exploring the differences among them, and we evaluate the extracted final results. We analyse multiple Natural Language Processing (NLP) techniques that have been implemented for NER shared tasks.
We will compare and estimate state-of-the-art architectures for Neural networks (LSTM, CNN, RNN, CRF) aiming at identifying the following entity types: Person, Location, Organization, Misc (Miscellaneous, general term). Additionally, we will study the importance of using or not, hand-crafted or lexical features. For a fair comparison, we will use the same dataset to train the models, as described in the actual papers. Each approach is evaluated using precision, recall and F1 score.
Named Entity Recognition, Entity Generation, Feature Extraction, Neural Networks, Deep Learning