Unit:
Τομέας Υπολογιστικών Συστημάτων και ΕφαρμογώνLibrary of the School of Science
Author:
Κωτσομητόπουλος Αριστοτέλης
Παπαπαναγιωτάκης-Μπουσύ Ιάσων
Supervisors info:
Ιζαμπώ Καράλη
Original Title:
Αυτόματη ανάκτηση παγκόσμιας ειδησεογραφίας και κατηγοριοποίηση της με χρήση εξαγόμενων μέτα-δεδομένων
Translated title:
Automatic retrieval of world news and classification based on extracted metadata
Summary:
This work deals with the construction of a system for the automatic retrieval
of global news articles, extracting information from them and categorize them
into thematic sections with the greatest possible semantic similarity. For
carrying out this task we dealt both with technical problems such as the ones
presented when automatically retrieving articles from around the Web and
exploring the scientific fields of natural language processing and
meta-information extraction.
To build the system we evaluated multiple technologies and tools and the system
went through several phases and directions some of which later on we had to
review. This we believe has led us to create a mature and well operating system.
We have very good results in terms of quality of thematic groups created. In
our case we consider a thematic group is good when it has as many articles as
possible that deal with the same theme without the presence of irrelevant
articles as well as they are all closely related to a single fact. Finally our
system presents major execution times which we justify and recommend future
directions for improvement.
Keywords:
information extraction, categorization, natural language processing, internet, world news
Number of index pages:
9,10,11
File:
File access is restricted only to the intranet of UoA.
document.pdf
2 MB
File access is restricted only to the intranet of UoA.
attachments.zip
34 MB
File access is restricted.