Distributed tag-set correlation calculation using storm

Graduate Thesis uoadl:1324517 534 Read counter

Unit:
Τομέας Υπολογιστικών Συστημάτων και Εφαρμογών
Library of the School of Science
Deposit date:
2015-07-29
Year:
2015
Author:
Φωκέας Σωτήριος
Supervisors info:
Αλέξης Δελής
Original Title:
Distributed tag-set correlation calculation using storm
Languages:
English
Translated title:
κατανεμημένος Υπολογισμός συσχετίσεων συνόλων ετικετών
Summary:
In this work, we target on analyzing data published in Social Media. Users of
Social Media networks post text, images or videos and annotate each one of them
with a set of tags describing their content. What we seek is to find an
efficient way to compute the correlations of co-occurring tags over time. The
amount and pace of published posts makes it necessary to parallelize the
computations. Data is divided to multiple nodes making each one of them
responsible of computing correlations of its own share. What is challenging in
a setting like this is to ensure that each node will compute a subset of the
coefficients and the processing load will be evenly distributed across nodes.
For this reason, a graph is devised that can maintain all essential data. This
graph is dynamically and continuously partitioned across nodes. The proposed
approach seeks to build an efficient model to not only effectively calculate
correlations, but also divide the load in a natural and self-organizing way
amongst peers. Finally, the scheme is prototyped in Java, using Apache Storm
Stream Processing platform, which effectively demonstrates that this approach
is feasible.
Keywords:
graph, social media, tags, distributed systems, correlations
Index:
Yes
Number of index pages:
8,9,10,11
Contains images:
Yes
Number of references:
20
Number of pages:
50
document.pdf (488 KB) Open in new window

 


attachments.zip
13 KB
File access is restricted.