Development of just-in-time query operators for complex data using partial evaluation

Postgraduate Thesis uoadl:3395963 36 Read counter

Unit:
Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική
Deposit date:
2024-04-09
Year:
2024
Author:
Zerntev Alexandros
Supervisors info:
Ντούλας Αλέξανδρος, Επίκουρος Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Γουνόπουλος Δημήτριος, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Gaidioz Benjamin, PhD RAW Labs SA
Original Title:
Development of just-in-time query operators for complex data using partial evaluation
Languages:
English
Greek
Translated title:
Development of just-in-time query operators for complex data using partial evaluation
Summary:
Every day, substantial quantities of data are generated globally. This data manifests in various formats, including structured forms such as CSV, XML, and JSON, as well as in unstructured formats. The data is stored in a multitude of locations, such as databases of different types and cloud storage platforms. This diversity poses a significant challenge when attempting to efficiently query combined data sources with minimal effort and rapid execution. Snapi is an open-source query language developed by RAW Labs, which is specifically designed to address this issue. Snapi allows the user to aggregate, join and transform these distributed and heterogenous datasets in real-time. To improve query performance and effectively handle heterogeneous datasets, we adopted the GraalVM/Truffle framework for executing Snapi. This approach not only accelerates query processing but also offers a more maintainable solution compared to custom code generation, which we believe enhances overall system robustness. This approach in- volves supplying Truffle with a Snapi language implementation (interpreter) and the Ab- stract Syntax Tree of the query as input. Truffle, in turn, generates highly optimized code that can dynamically adapt to the specific data being queried. We evaluate our system against similar JVM Based code generation approach. We achieve up to x2.5 speedup in cold start end to end execution and up to x11.5 speedup in queries for data which attributes can have more than one type.
Main subject category:
Technology - Computer science
Keywords:
Databases, Compilers, Virtual Machines, Query Languages
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
20
Number of pages:
68
Master_s_Thesis.pdf (1 MB) Open in new window