ISG Talks are sponsored by Couchbase.
- This event has passed.
Yiming Lin: QUIP: Query-driven Missing Value Imputation
February 10, 2023 @ 1:00 pm - 2:00 pm
This paper develops a query-time missing value imputation frame- work, entitled QUIP, that minimizes the joint costs of imputation and query execution. QUIP achieves this by modifying how rela- tional operators are processed. It adds a cost-based decision function in each operator that checks whether the operator should invoke imputation prior to execution or to defer the imputations for down- stream operators to resolve. QUIP implements a new approach to evaluating outer join that preserve missing values during query processing, and a bloom filter based index structure to optimize the space and running overhead. We have implemented QUIP using ImputeDB – a specialized database engine for data cleaning. Exten- sive experiments on both real and synthetic data sets demonstrates the effectiveness and efficiency of QUIP, which outperforms the state-of-the-art ImputeDB by 2 to 10 times on different query sets and data sets, and achieves the order-of-magnitudes improvement over offline approach.