Automatic retrieval of world news and classification based on extracted metadata

Graduate Thesis uoadl:1324114 458 Read counter

Unit:
Τομέας Υπολογιστικών Συστημάτων και Εφαρμογών
Library of the School of Science
Deposit date:
2015-07-19
Year:
2015
Author:
Κωτσομητόπουλος Αριστοτέλης
Παπαπαναγιωτάκης-Μπουσύ Ιάσων
Supervisors info:
Ιζαμπώ Καράλη
Original Title:
Αυτόματη ανάκτηση παγκόσμιας ειδησεογραφίας και κατηγοριοποίηση της με χρήση εξαγόμενων μέτα-δεδομένων
Languages:
Greek
Translated title:
Automatic retrieval of world news and classification based on extracted metadata
Summary:
This work deals with the construction of a system for the automatic retrieval
of global news articles, extracting information from them and categorize them
into thematic sections with the greatest possible semantic similarity. For
carrying out this task we dealt both with technical problems such as the ones
presented when automatically retrieving articles from around the Web and
exploring the scientific fields of natural language processing and
meta-information extraction.
To build the system we evaluated multiple technologies and tools and the system
went through several phases and directions some of which later on we had to
review. This we believe has led us to create a mature and well operating system.
We have very good results in terms of quality of thematic groups created. In
our case we consider a thematic group is good when it has as many articles as
possible that deal with the same theme without the presence of irrelevant
articles as well as they are all closely related to a single fact. Finally our
system presents major execution times which we justify and recommend future
directions for improvement.
Keywords:
information extraction, categorization, natural language processing, internet, world news
Index:
Yes
Number of index pages:
9,10,11
Contains images:
Yes
Number of references:
40
Number of pages:
57
File:
File access is restricted only to the intranet of UoA.

document.pdf
2 MB
File access is restricted only to the intranet of UoA.

 


attachments.zip
34 MB
File access is restricted.