Περίληψη:
Database systems serve a wide range of use cases efficiently, but
require data to be loaded and adapted to the system's execution engine.
This pre-processing step is a bottleneck to the analysis of the
increasingly large and heterogeneous datasets. Therefore, numerous
research efforts advocate for querying each dataset in situ, i.e.,
without pre-loading it in a DBMS. On the other hand, performing analysis
over raw data entails numerous overheads because of the potentially
inefficient data representations.
In this paper, we investigate the effect of vector processing on raw
data querying. We enhance the operators of a query engine to use SIMD
operations. Specifically, we examine the effect of SIMD on two different
cases: the scan operators that perform the CPU intensive task of input
parsing, and the part of the query pipeline that performs a selection
and computes an aggregate. We show that a vectorized approach has a lot
of potential to improve performance, which nevertheless comes with
trade-offs.