Making queries tractable on big data with preprocessing (through the eyes of complexity theory)Making queries tractable on big data with preprocessing (through the eyes of complexity theory)
Faculty of Sciences. Mathematics and Computer Science
Advanced Database Research and Modeling (ADReM)
Proceedings of the VLDB Endowment
6(2013):9, p. 685-696
University of Antwerp
A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al- gorithms often become infeasible in practice. A traditional and eective approach to coping with this is to preprocess data o-line, so that queries in the class can be subsequently evaluated on the data eciently. This paper aims to pro- vide a formal foundation for this approach in terms of com- putational complexity. (1) We propose a set of -tractable queries, denoted by T0 Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu- ral query classes are -tractable and are feasible on big data. (3) We also study a set TQ of query classes that can be ef- fectively converted to -tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for TQ. (5) We also show that T0 Q P unless P = NC, i.e., the set T0 Q of all -tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, TQ = P, i.e., all PTIME query classes can be made -tractable via proper re- factorizations. This work is a step towards understanding the tractability of queries in the context of big data.