Unit:
Department of Informatics and TelecommunicationsΠληροφορική
Author:
PAPASTAMOU IOANNIS
Supervisors info:
Κουμπαράκης Μανόλης, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, Ε.Κ.Π.Α
Original Title:
Techniques for sentence-boundary detection in Greek legal text
Translated title:
Techniques for sentence-boundary detection in Greek legal text
Summary:
Sentence Boundary Detection (SBD), also known as sentence boundary disambiguation, is a key underlying task for Natural Language Processing (NLP). Although SBD is considered to be a simple problem, it becomes more complex in other domains due to unorthodox use of punctuation symbols. For example, drug names in medical documents, case citations in legal text and references in academic articles, all use punctuation in ways which are uncommon in common documents such as the newswire documents. SBD is also a task that is language dependent. Every language brings its own unique problems when it comes to SBD. SBD has generally not received much attention in the field of the NLP research. The current thesis examines different ways SBD can be applied to the Raptarchis Dataset. We develop two SBD systems, each based on a different approach, and we analyze their advantages and disadvantages. We conclude, by using the SBD system that performed better, and provide a new version of the Raptarchis dataset with its sentences annotated.
Main subject category:
Technology - Computer science
Keywords:
Natural Language Processing, Legal Documents
Thesis.pdf (2 MB)
Open in new window
resources.zip
51 MB
File access is restricted.