Observator: ένας επεκτάσιμος περιηγητής συστήματος αρχειοθέτησης

Postgraduate Thesis uoadl:1321303 481 Read counter

Unit:
Κατεύθυνση / ειδίκευση Υπολογιστικά Συστήματα: Λογισμικό και Υλικό (ΣΥΣ)
Library of the School of Science
Deposit date:
2015-12-20
Year:
2015
Author:
Τσιλίγκος Κλεομένης
Supervisors info:
Μέμα Ρουσσοπούλου
Original Title:
Observator: ένας επεκτάσιμος περιηγητής συστήματος αρχειοθέτησης
Languages:
Greek
Summary:
The Web has become the primary medium of accessing new ideas and information.
Various research, social and historical reasons drive the digital preservation
of the available information. However, the combination of several factors, such
as the World Wide Web’s increasing scale, its fragmentation along with its
transformation from a text data repository to a multimedia platform, poses
significant challenges on the archiving process. As an indicative example,
there exist web pages that their content or/and structure is different
depending on the geographic location of the access. Consequently, a modern
archival system requires the use of an appropriate crawler and storage system
that takes into consideration the peculiarities of the web. In this thesis, we
describe the design, implementation and evaluation of such a system. The
crawler should be extensible and configurable regarding its functionality and
traversal policy of the target host, along with being able to complete its
workload in a considerable amount of time while limiting the load it puts on
the target servers. On the other hand, the storage system needs to
differentiate among the available versions of the same resource and efficiently
eliminate duplicate content. It needs, also, to facilitate the process of
rewriting a page’s references in order to match the collection’s local
structure and support fast retrieval times. Finally, the crawler and storage
system needs to be loosely coupled. The experimental evaluation of the
prototype provides proof that the above objectives are achieved, and presents
useful results regarding the efficiency of the system.
Keywords:
extensible crawler, storage system, traversal policy, duplicates elimination, presentation issues
Index:
Yes
Number of index pages:
53-54
Contains images:
Yes
Number of references:
13
Number of pages:
55
File:
File access is restricted.

document.pdf
1 MB
File access is restricted.