Pergamos - Library and Information Center of National and Kapodistrian University of Athens

Unit:

Τομέας Υπολογιστικών Συστημάτων και Εφαρμογών
Library of the School of Science

Deposit date:

2015-07-19

Year:

2015

Author:

Κωτσομητόπουλος Αριστοτέλης
Παπαπαναγιωτάκης-Μπουσύ Ιάσων

Supervisors info:

Ιζαμπώ Καράλη

Original Title:

Αυτόματη ανάκτηση παγκόσμιας ειδησεογραφίας και κατηγοριοποίηση της με χρήση εξαγόμενων μέτα-δεδομένων

Languages:

Greek

Translated title:

Automatic retrieval of world news and classification based on extracted metadata

Summary:

This work deals with the construction of a system for the automatic retrieval
of global news articles, extracting information from them and categorize them
into thematic sections with the greatest possible semantic similarity. For
carrying out this task we dealt both with technical problems such as the ones
presented when automatically retrieving articles from around the Web and
exploring the scientific fields of natural language processing and
meta-information extraction.
To build the system we evaluated multiple technologies and tools and the system
went through several phases and directions some of which later on we had to
review. This we believe has led us to create a mature and well operating system.
We have very good results in terms of quality of thematic groups created. In
our case we consider a thematic group is good when it has as many articles as
possible that deal with the same theme without the presence of irrelevant
articles as well as they are all closely related to a single fact. Finally our
system presents major execution times which we justify and recommend future
directions for improvement.

Keywords:

information extraction, categorization, natural language processing, internet, world news

Index:

Yes

Number of index pages:

9,10,11

Contains images:

Yes

Number of references:

Number of pages:

File:

File access is restricted only to the intranet of UoA.

Persistent URL:

https://pergamos.lib.uoa.gr/uoa/dl/object/1324114

document.pdf
2 MB
File access is restricted only to the intranet of UoA.

attachments.zip
34 MB
File access is restricted.

Automatic retrieval of world news and classification based on extracted metadata

PDF file

Additional material (optional zip file)