Big Data Analytics for the Cloud

Postgraduate Thesis uoadl:1665218 754 Read counter

Unit:
Κατεύθυνση Ηλεκτρονικός Αυτοματισμός (Η/Α, με πρόσθετη εξειδίκευση στην Πληροφορική και στα πληροφοριακά συστήματα)
Library of the School of Science
Deposit date:
2017-06-13
Year:
2017
Author:
Gkatzios Nikolaos
Supervisors info:
Άννα Τζανακάκη, Επίκουρη Καθηγήτρια, Τμήμα Φυσικής, ΕΚΠΑ
Διονύσιος Ι. Ρεΐσης, Αναπληρωτής Καθηγητής, Τμήμα Φυσικής, ΕΚΠΑ
Έκτορας Νισταζάκης, Αναπληρωτής Καθηγητής, Τμήμα Φυσικής, ΕΚΠΑ
Original Title:
Big Data Analytics for the Cloud
Languages:
English
Translated title:
Big Data Analytics for the Cloud
Summary:
The work for this master thesis is divided into three parts. The first part focused on the study and presentation of scalable solutions for data processing architectures for the Big Data challenge. The second focuses on the processing of a dataset comprising measurements that were collected by different sensors, which were installed on a train. The last part focused on is the setup of a server of the open source IoT platform SiteWhere, the dispatch of data to the server, the storage of the data to a NoSQL database and the processing of these data in a Spark instance.
Chapter 1 provides an introduction. In Chapter 2, the architecture and the capabilities of SiteWhere as a holistic solution for IoT management is presented. Chapter 3 introduces the basic notions of the terms “Big Data” and “Cloud”. It also presents different solutions for the Big Data challenge along with the scientific trends on this topic. In Chapter 4, a study of various Clustering algorithms (KMeans, Birch, Mean Shift, DBSCAN), which are used to process the real dataset collected from onboard train sensors, takes place. Chapter 5 introduces the notion of “time-series forecasting” and investigates the behavior of two different types of Neural Networks (MLP, LSTM) with respect to this notion. Chapter 6 presents the work that took place on the SiteWhere platform. The chapter begins with the description of the dispatch of data to the server and continues with the visualization, on Grafana, of the train data that were stored in InfluxDB, a database that SiteWhere supports. Following this, the retrieval of the data from the database and their processing (through KMeans Clustering and Forecasting with MLP) on a Spark instance takes place and finally a comparison between that process and the one on the local system is presented. Chapter 7 provides a summary and highlights some of the conclusions that were derived and presented in the previous chapters.
Main subject category:
Science
Other subject categories:
Algorithms and Theory of Computation
Technology - Computer science
Keywords:
Big Data, Cloud, IoT, Clustering, Forecasting
Index:
No
Number of index pages:
0
Contains images:
Yes
Number of references:
63
Number of pages:
96
Master Thesis.pdf (5 MB) Open in new window