Techniques for sentence-boundary detection in Greek legal text

Graduate Thesis uoadl:3309213 59 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2023-03-22
Year:
2023
Author:
PAPASTAMOU IOANNIS
Supervisors info:
Κουμπαράκης Μανόλης, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, Ε.Κ.Π.Α
Original Title:
Techniques for sentence-boundary detection in Greek legal text
Languages:
English
Greek
Translated title:
Techniques for sentence-boundary detection in Greek legal text
Summary:
Sentence Boundary Detection (SBD), also known as sentence boundary disambiguation, is a key underlying task for Natural Language Processing (NLP). Although SBD is considered to be a simple problem, it becomes more complex in other domains due to unorthodox use of punctuation symbols. For example, drug names in medical documents, case citations in legal text and references in academic articles, all use punctuation in ways which are uncommon in common documents such as the newswire documents. SBD is also a task that is language dependent. Every language brings its own unique problems when it comes to SBD. SBD has generally not received much attention in the field of the NLP research. The current thesis examines different ways SBD can be applied to the Raptarchis Dataset. We develop two SBD systems, each based on a different approach, and we analyze their advantages and disadvantages. We conclude, by using the SBD system that performed better, and provide a new version of the Raptarchis dataset with its sentences annotated.
Main subject category:
Technology - Computer science
Keywords:
Natural Language Processing, Legal Documents
Index:
Yes
Number of index pages:
3
Contains images:
Yes
Number of references:
15
Number of pages:
45
Thesis.pdf (2 MB) Open in new window

 


resources.zip
51 MB
File access is restricted.