Nikolaus Augsten, Michael Böhlen, Johann Gamper — VLDB'06
Several recent papers argue for approximate lookups in hierarchical data and propose index structures that support approximate searches in large sets of hierarchical data. These index structures must be updated if the underlying data changes. Since the performance of a full index re-construction is prohibitive, the index must be updated incrementally.
We propose a persistent and incrementally maintainable index for approximate lookups in hierarchical data. The index is based on small tree patterns, called pq-grams. It supports efficient updates in response to structure and value changes in hierarchical data and is based on the log of tree edit operations. We prove the correctness of the incremental maintenance for sequences of edit operations. Our algorithms identify a small set of pq-grams that must be updated to maintain the index. The experimental results with synthetic and real data confirm the scalability of our approach.
Slides: Screen [PDF] , Printer [PDF]
Source code: [approxlib]