ISG Talks are sponsored by Couchbase.
- This event has passed.
Effective Filters and Linear Time Verification for Tree Similarity Joins
February 14, 2020 @ 3:00 pm - 4:00 pm
Thomas Hütter (University of Salzburg)
The tree similarity join computes all similar pairs in a collection of trees. Two trees are similar if their edit distance falls within a user-defined threshold. Previous algorithms, which are based on a filter-verify approach, suffer from the following two issues. First, ineffective filters produce a large number of candidates that must be further verified. Second, the candidates are verified by computing the tree edit distance, which is cubic in the number of tree nodes. Thus, these techniques fail to scale to large tree collections and are infeasible even for small collections when the trees are large. In this talk, a scalable solution for the tree similarity join, called TJoin, is presented that is based on (1) an effective indexing technique that leverages both the labels and the structure of trees to reduce the number of candidates, (2) an efficient upper bound filter that moves many of the candidates directly to the join result without additional verification, and (3) a linear time verification technique for the remaining candidates that avoids the expensive tree edit distance computation. Unlike previous solutions, TJoin scales to collections with millions of large trees and improves the overall join time by up to two orders of magnitude w.r.t. the state of the art.
Thomas Hütter received his bachelor’s and master’s degrees in Computer Sciences from the University of Salzburg, Austria, in 2014 and 2017, respectively. During his master’s degree, he spent a semester abroad as an exchange student at École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. His research career started in 2013 when he became a member of the Computational Systems Group supervised by Prof. Christoph Kirsch at the University of Salzburg. His work focused on memory management optimizations which resulted in two projects in collaboration with Google Munich.
Currently, he is pursuing his Ph.D. degree in the Database Research Group supervised by Prof. Nikolaus Augsten in Salzburg. His research interests include efficient algorithms for similarity queries on complex data structures. In 2019, Thomas Hütter won the “Young Investigators Award” for his work on tree similarity joins (published at ICDE 2019) and was awarded with the “Austrian Marshallplan Scholarship”.