Big Data Visual Analytics Architecture

Postgraduate Thesis uoadl:2780547 571 Read counter

Unit:
Κατεύθυνση / ειδίκευση Διαχείριση Πληροφορίας και Δεδομένων (ΔΕΔ)
Πληροφορική
Deposit date:
2018-08-07
Year:
2018
Author:
Vlantis Panagiotis
Supervisors info:
Αλέξης Δελής, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Μαρία Ρούσσου, Επίκουρη Καθηγήτρια, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνώνe
Original Title:
Αρχιτεκτονική Οπτικής Ανάλυσης Υψηλής Κλίμακας σε Υπολογιστικό Νέφος
Languages:
English
Translated title:
Big Data Visual Analytics Architecture
Summary:
Analyses with data mining and knowledge discovery techniques are not always successful as they occasionally yield no actionable results. This is especially true in the Big Data context where we routinely deal with complex, heterogeneous, diverse and rapidly changing data. In this context, visual analytics play a key role in helping both experts and users to readily comprehend and better manage analyses carried on data stored in Infrastructures as a Service (IaaS). To this end, humans should play a critical role in continually ascertaining the value of the processed information and are invariably deemed to be the instigators of actionable tasks. The latter is facilitated with the assistance of sophisticated tools that let humans interface with the data through vision and interaction. When working with Big Data problems, both scale and nature of data undoubtly present a barrier in implementing responsive applications. In this thesis, we propose a software architecture that seeks to empower Big Data analysts with visual analytics tools atop large-scale data stored in and processed by IaaS infrastructures. Our key goal is to not only yield on-line analytic processing but also provide the facilities for the users to effectively interact with the underlying IaaS machinery. Although we focus on hierarchical and spatio-temporal datasets here, our proposed architecture is general and can be used to a wide number of application domains. The core design principles of our approach are: a) On-line processing on cloud with Apache Spark. b) Integration of interactive programming following the notebook paradigm through Apache Zeppelin. c) Offering robust operation when data and/or schema change on the fly. Through experimentation with a prototype of our suggested architecture, we demonstrate not only the viability of our approach but also we show its value in a use-case involving publicly-available crime data from the United Kingdom.
Main subject category:
Technology - Computer science
Keywords:
big data, visual analytics, spatio-temporal data, cloud infrastructure, apache spark, interactive programming
Index:
Yes
Number of index pages:
4
Contains images:
Yes
Number of references:
24
Number of pages:
38
thesis.pdf (2 MB) Open in new window