Unit:
Department of Informatics and TelecommunicationsΠληροφορική
Supervisors info:
ΙΩΑΝΝΗΣ ΙΩΑΝΝΙΔΗΣ ΚΑΘΗΓΗΤΗΣ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ ΕΚΠΑ
Original Title:
Ροές Εργασιών σε Κατανεμημένα Συστήματα
Translated title:
Workflows in Distributed Systems
Summary:
In the modern era,which is characterized as the era of information,the need for retrieval,storage and processing of the all increasing data and the execution of complex resource consuming algorithms led to the migration of computing processing from single powerful computer systems to clusters,in which the workload is equally distributed among the nodes to achieve fast results through parallelization.
The goal of this theses is the study of such systems that control,distribute and process data and manage the communication among the nodes.These systems are called Workflow Engines.We present some basic examples of these systems such as Apache Spark,Apache Hadoop and Apache Taverna and we try to compare them.
Finally we present the application which was built on top of Spark where we try to simplify the process of creating and submitting Spark Jobs on a Spark Cluster.
Main subject category:
Technology - Computer science
Keywords:
Big Data, Apache Spark, HDFS, Apache Hadoop, Distributed Systems, cloud, clusters