Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Filtering of XML Documents with XPath Expressions

Similar presentations


Presentation on theme: "Efficient Filtering of XML Documents with XPath Expressions"— Presentation transcript:

1 Efficient Filtering of XML Documents with XPath Expressions
Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Information Sciences Research Center Bell Laboratories, Lucent Technologies

2 Motivation Efficient Filtering of XML Documents with XPath Expressions Growing interest in content-based filtering & routing of data. Data Publishers Subscription Table Filtering Engine Subset of Relevant Data Consumers data Content-based Router XML => More expressive XPath-based subscriptions (e.g., Intel’s NetStructure XML Accelerator). Challenge: How to efficiently filter XML data with XPath-based subscriptions?

3 Problem Abstraction document XPath Filter Subset of S that match D D
Efficient Filtering of XML Documents with XPath Expressions Problem Abstraction XML document D Subset of S that match D XPath Filter S, Set of XPath expressions (XPEs)

4 Challenges Filtering with XPath expressions (XPEs) is non-trivial:
Efficient Filtering of XML Documents with XPath Expressions Challenges Filtering with XPath expressions (XPEs) is non-trivial: Complexity of XPEs -- tree-structured patterns that include ``*’’ and ``//’’ operators. Need for both unordered & ordered matchings. //a /b /f //e /*/d /c Example: p = // a / b [ c / * / d ] / / e / f XPE tree of p

5 Efficient Filtering of XML Documents with XPath Expressions
Our Solution: XTrie Speed up XPE filtering with a novel index called XTrie. Key idea: Decompose Complex, tree- structured XPE Set of simple, linear patterns (substrings) XTrie Index with trie

6 XTrie Index Construction Algorithm
Architecture of XTrie Efficient Filtering of XML Documents with XPath Expressions XTrie Index Construction Algorithm Complex, tree - structured XPEs XML document D XML Parser (SAX based) Start/End Element Events XTrie Index Set of XPEs that match D XTrie Matching Algorithm

7 Architecture of XTrie Complex, tree - structured XPEs Set of simple,
Efficient Filtering of XML Documents with XPath Expressions Architecture of XTrie Complex, tree - structured XPEs Set of simple, linear patterns (substrings) Decompose XPEs Build XTrie index XML document D XML Parser (SAX based) XTrie Index Start/End Element Events Trie Set of XPEs that match D XTrie Matching Algorithm Substring Table

8 Decomposition of XPEs Efficient Filtering of XML Documents with XPath Expressions Decompose each XPE p into a set of substrings that “cover” p. Substring = Sequence of element names along some path in XPE tree, where each consecutive pair of nodes is related by a “/” operator (without any “*” or “//”). Example: p = // a / b [ c / * / d ] // e / f Substrings in p = {a, b, c, d, e, f, ab, bc, ef, abc }. //a /b /f //e /*/d /c

9 Decomposition of XPEs Efficient Filtering of XML Documents with XPath Expressions Decompose each XPE p into a set of substrings that “cover” p. Substring = Sequence of element names along some path in XPE tree, where each consecutive pair of nodes is related by a “/” operator (without any “*” or “//”). Example: p = // a / b [ c / * / d ] // e / f Substrings in p = {a, b, c, d, e, f, ab, bc, ef, abc }. One possible decomposition of p is { a, bc, d, ef }. //a /b /f //e /*/d /c

10 Efficient Filtering of XML Documents with XPath Expressions
Decomposition of XPEs In general, there are many possible decompositions. Single-Element Decomposition Minimal Decomposition //a /b /f //e /*/d /c //a /b /c //e /*/d /f

11 Decomposition of XPEs Efficient Filtering of XML Documents with XPath Expressions “Enhanced” min. decomp. = min. decomp. with a substring ending at each branching node. //a /b /f //e /*/d /c Single-Element Decomposition Minimal . . . “Enhanced”

12 XTrie XTrie index consists of 2 components: XTrie Index Trie Substring
Efficient Filtering of XML Documents with XPath Expressions XTrie XTrie index consists of 2 components: Trie Substring Table XTrie Index

13 XTrie XPEs p = // a / a / b / * / a / b q = / a / b [ c] // b / c
Efficient Filtering of XML Documents with XPath Expressions XPEs p = // a / a / b / * / a / b q = / a / b [ c] // b / c

14 XTrie Decomposed Substrings XPEs /a /b /c //b //a /*/a p q
Efficient Filtering of XML Documents with XPath Expressions Decomposed Substrings /a /b /c //b //a /*/a p q XPEs p = // a / a / b / * / a / b q = / a / b [ c] // b / c

15 XTrie Decomposed Substrings Substring-Table /a /b /c //b //a /*/a p q
Efficient Filtering of XML Documents with XPath Expressions Decomposed Substrings /a /b /c //b //a /*/a p q Parent Row Rel. Level Num Child Rank aab 1 2 3 4 5 1 3 1 2 1 2 ab ab abc bc Substring-Table

16 XTrie Trie Substring-Table aab ab abc bc a b a b c Next Row Parent
Efficient Filtering of XML Documents with XPath Expressions Trie 1 a b 2 3 a b c Substring-Table 4 5 6 Next Row Parent 1 3 Rank 2 Rel. Level Num Child 4 5 b c aab ab 7 8 abc Child Node Ptr bc Substring Table Ptr

17 XTrie Trie Substring-Table aab ab abc bc a b a b c Next Row Parent
Efficient Filtering of XML Documents with XPath Expressions Trie 1 a b 2 3 a b c Substring-Table 4 5 6 Next Row Parent 1 3 Rank 2 Rel. Level Num Child 4 5 b c aab ab 7 8 abc Child Node Ptr Substring Table Ptr Max. Suffix Ptr bc

18 Optimizations for XTrie
Efficient Filtering of XML Documents with XPath Expressions Optimizations for XTrie “Lazy” variant of XTrie Reduce number of accesses to substring-table by probing it only when the matched substring is a leaf substring of some XPE. XTrie for single-path XPEs Optimize data structures & algorithms by exploiting the simpler structures of single-path XPEs.

19 Related Work Commercial Products (e.g. BEA, Intel, etc).
Efficient Filtering of XML Documents with XPath Expressions Related Work Commercial Products (e.g. BEA, Intel, etc). XFilter [ Altinel & Franklin, VLDB’00] Model single-path XPEs as finite state machines (FSMs). /a /c //b p = / a // b / c Build a hash index on FSMs’ transitions (ie element names). a b c candidate-list wait-list Optimizations XFilter-LB = XFilter with list balancing Prefiltering = 2 parses over XML data to pre-filter some XPEs.

20 Experimental Evaluation
Efficient Filtering of XML Documents with XPath Expressions Experimental Evaluation DTD: NITF (News Industry Text Format) 123 elements, 513 attributes XML data: Generated with IBM’s XML Generator (size = 20, 100, 1000 tag pairs) XPath expressions: Generated using our own generator (P = #XPEs, L = max. depth, Pw = prob. of ‘*’, Pd = prob of ‘//’, z = skew of element names) Algorithms: Eager & Lazy XTrie, XFilter & XFilter-LB [Altinel & Franklin, VLDB00] System: Sun Ultra-250 (296MHz) with 512 MB memory running Solaris 2.7 NITF: News Industry Text Format

21 Efficient Filtering of XML Documents with XPath Expressions
Scalability (# XPEs)

22 Efficient Filtering of XML Documents with XPath Expressions
Scalability (# tags)

23 Efficient Filtering of XML Documents with XPath Expressions
Conclusions XTrie -- A novel index structure that supports the efficient filtering of streaming XML data based on XPath expressions. Features: Index both simple single-path as well as complex tree-structured XPath expressions. Handles ordered, unordered, and hybrid modes of matching.

24 Efficient Filtering of XML Documents with XPath Expressions
Speedup / # XPEs (m)

25 Efficient Filtering of XML Documents with XPath Expressions
Wildcards (m)

26 Efficient Filtering of XML Documents with XPath Expressions
Descendants (m)

27 Efficient Filtering of XML Documents with XPath Expressions
Number of levels (m)

28 Efficient Filtering of XML Documents with XPath Expressions
Skew (m)


Download ppt "Efficient Filtering of XML Documents with XPath Expressions"

Similar presentations


Ads by Google