Extending the temporal tagger HeidelTime for the Greek language

Postgraduate Thesis uoadl:2922239 165 Read counter

Unit:
Κατεύθυνση Πληροφορική στην Ιατρική
Πληροφορική
Deposit date:
2020-09-10
Year:
2020
Author:
Kapernaros Emmanouil
Supervisors info:
Μανόλης Κουμπαράκης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
Extending the temporal tagger HeidelTime for the Greek language
Languages:
English
Greek
Translated title:
Extending the temporal tagger HeidelTime for the Greek language
Summary:
Here we describe our work on extending the temporal tagger HeidelTime for the Greek
language. HeidelTime is a Multilingual rule-based Temporal Tagger that performs the
full task of temporal tagging including the extraction and normalization subtasks. It
achieves multilingual capabilities by using language-specific resources that are
separate from its source code and can be easily adapted. It contains manually
developed resources for 13 languages and automatic resources for 200+ languages
including Greek. HeidelTime can extract and then classify temporal expressions into
date, time, duration and set classes and then normalize them with a standard format
value. For instance, when the expression "March 11, 2013" is detected, it is extracted
and normalized with the value “2013-03-11”. The goal of the project is to manually
develop publicly available Greek Resources extending the automatic ones. We did this
developing language specific resources which are .txt files with a specific syntax. For
the development process a manually annotated Greek corpus was necessary for the
training. For this purpose we created WikiWarsEL, a corpus with Greek annotated
expressions which contains 19 war documents from Greek Wikipedia. Finally, we
evaluated the newly developed resources with more war documents from Wikipedia
which were not used in the training process. The Value F1-score result we achieved
was 82.31%, a significant improvement over the 2.19% of the automatic resources.
Main subject category:
Technology - Computer science
Keywords:
time, temporal expressions, temporal tagger
Index:
Yes
Number of index pages:
4
Contains images:
Yes
Number of references:
8
Number of pages:
42
Thesis.pdf (489 KB) Open in new window