Summary:
The contribution of data analysis to the acquisition of new knowledge and the decision-
making process is particularly important. The huge volume of data makes it necessary
to process them in order to generate useful knowledge.
Data mining is a process of inspecting, cleaning, transforming, and modeling data in
order to discover useful information, draw conclusions and make decisions. Data
analysis has multiple aspects and approaches, including varied techniques with varying
names, in different business, science and social sciences. Data mining is a specific data
analysis technique that focuses on modeling and discovery of knowledge for predictive
purposes rather than for purely descriptive purposes.
In this diploma thesis we examine the process of collecting and processing data from
sensors. For this process, as the volume of data to be processed is large, technologies
and frameworks designed to process large data will be used.
Big data is data sets that are so bulky and complex that traditional data processing
software is inadequate to deal with. Major data challenges include data collection, data
storage, data analysis, search, sharing, transportation, visualization, questioning,
updating, privacy and data origination.
Apache Storm software has been reliably used in order to process the data in real time.
Additionally, in addition to the aforementioned package, Esper software was also used
to analyze series of events to produce useful conclusions.
The data used in this work is data acquired using sensors. Processing this kind of data
can be processed in two ways: batch processing and stream processing.
Keywords:
Data Analysis, Big Data, Data Mining, Apache, Storm, sensor data, Kafka, Kibana, Elasticsearch