Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

Similar presentations


Presentation on theme: "On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen."— Presentation transcript:

1 On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen Ting, Ling Tok Wang

2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 2 Outline Background Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin Experiments Conclusion

3 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 3 XML Twig Pattern Matching An XML document is commonly modeled as a rooted, ordered and tagged tree. book preface chapter section figure paragraph section figure paragraphfigure paragraph …………. title XML Data Intro

4 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 4 Regional Coding Node Label 1 : (startPos: endPos, LevelNum) E.g. book (0: 32, 1) preface (1:3, 2)chapter (4:29, 2) chapter(30:31, 2) Intro (2:2, 3) section (5:28, 3) section(9:17, 4) figure (14:15, 6) paragraph(13:16, 5) section(18:23, 4) figure (20:21, 6) paragraph(19:22, 5) figure (25:26, 5) paragraph(24:27, 4) title: (6:8, 4) title: (10:12, 5) 1. M.P. Consens and T.Milo. Optimizing queries on files. In In Proceedings of ACM SIGMOD, 1994. Data (7:7, 3) XML (11:11, 3)

5 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 5 What is a Twig Pattern? A twig pattern is a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor- Descendant (A-D) edges. E.g. Selects Figure elements which are descendants of Paragraph elements which in turn are children of Section elements having child element Title XPath: Section[Title]/Paragraph//Figure Twig pattern : Section Title Paragraph Figure

6 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 6 XML Twig Pattern Matching Problem Statement Given a query twig pattern Q, and an XML database D, we need to compute ALL the answers to Q in D. E.g. Consider Query and Document: Document: s1 s2 f1 p1 t1 t2 Section titlefigure Query solutions: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1) Query:

7 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 7 XML Twig Pattern Matching Problem Statement Given a query twig pattern Q, and an XML database D, we need to compute ALL the answers to Q in D. E.g. Consider Query and Document: Document: s1 s2 f1 p1 t1 t2 Section titlefigure Query solutions: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1) Query:

8 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 8 XML Twig Pattern Matching Problem Statement Given a query twig pattern Q, and an XML database D, we need to compute ALL the answers to Q in D. E.g. Consider Query and Document: Document: s1 s2 f1 p1 t1 t2 Section titlefigure Query solutions: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1) Query:

9 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 9 Outline Background Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin Experiments Conclusion

10 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 10 Previous work: TwigStack TwigStack 2 : a holistic approach Each element in the document is labeled with region encoding labeling scheme. The input data is the labels of all elements whose tags occur in the query twig. The output data is the matching solutions with the format of n-tuple, where n is the number of nodes in query. For each node in the query, there exists a corresponding input stream. Each label in a stream is scanned only once. That is, the cursor of each stream is not allowed to go back in any time. 2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002.

11 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 11 Previous work: TwigStack TwigStack 2 : a holistic approach Two-phase algorithm: Phase 1 TwigJoin: intermediate root-leaf paths are outputted Phase 2 Merge: merge the intermediate paths to get the final results 2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002.

12 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 12 Previous work: TwigStack A node q in a twig pattern Q is associated with a stack S q Insertion and deletion in a stack S q Insertion: An element e q from stream T q is pushed into its stack S q if and only if e q has a descendant e qi in each T qi, where q i is a child of q Each node e qi recursively has the first property Deletion: An element e q is popped out from its stack if all matches involving it have been output.

13 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 13 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query:

14 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 14 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3

15 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 15 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2)

16 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 16 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2)

17 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 17 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1

18 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 18 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 2:3,2

19 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 19 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions:

20 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 20 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions: 4:9,2

21 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 21 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions: 4:9,2 5:6,3

22 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 22 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions:,,, 4:9,2

23 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 23 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions:,,, 4:9,2 7:8,3

24 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 24 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions:,,,,, 4:9,2

25 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 25 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions:,,,, 10:11,2

26 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 26 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) 1:12,1 Output path solutions:,,,,,

27 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 27 XML Twig Pattern Matching Document: s1 s2 f1 f2 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 10:11,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3), (10:11,2) Output path solutions:,,,,, Merge:,,,,

28 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 28 Sub-optimality of TwigStack If the query contains any parent-child relationship, TwigStack may output some intermediate path solutions that cannot contribute to final results. We call that TwigStack is sub-optimal for queries with parent- child relationships.

29 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 29 Example: sub-optimality of TwigStack Document: s1 s2 f1 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3)

30 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 30 Example: sub-optimality of TwigStack Document: s1 s2 f1 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3) 1:12,1 Because f1 and t1 are descendants of s1, s1 is pushed to the stack. Note that f1 is not a child of s1.

31 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 31 Example: sub-optimality of TwigStack Document: s1 s2 f1 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3) 1:12,1 2:3,2

32 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 32 Example: sub-optimality of TwigStack Document: s1 s2 f1 t1 t2 Section title figure Query: 1:12,1 2:3,2 4:9,2 5:6,37:8,3 (1:12,1), (4:9,2) (2:3,2), (5:6,3) Section title figure (7:8,3) 1:12,1 Output solution:. But it is a useless intermediate solution and do not contribute to any final solution.

33 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 33 TwigStackList The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships. Alternative: TwigStackList 3 [CIKM 2004] TwigStackList 3 is an improvement algorithm for TwigStack, which consider parent-child relationships in the first phase and identify a large query class to be optimal than TwigStack. 3. J. Lu, T. Chen, and T. W. Ling. Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In CIKM, pages 533- 542, 2004.

34 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 34 Optimal class of TwigStack and TwigStackList TwigStackTwigStackList Optimal query class All edges are ancestor-descendant relationships All edges connecting branching nodes and the children are ancestor-descendant relationship TwigStack O S S TwigStackList O O S O :optimal S: sub-optimal

35 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 35 Challenges (1) Although TwigStackList enlarges the optimal query class of TwigStack, it still shows sub- optimal for a large class of twig query. For example: two sub-optimal twig queries for TwigStackList : Section title figure Section title figure

36 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 36 Challenges (2) In algorithms TwigStack and TwigStackList, to answer a twig query, they need to read labels for all elements whose tags occur in the query. Can we accelerate the query processing by reading only parts of them ? Section title figure Query: Document : s1s1 f1f1 t1t1 f2f2 fnfn …… There is no answer in the document, since no figure elements in level 2. But previous algorithms still need to read all figure elements in Level 3. Level 1: Level 2: Level 3:

37 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 37 Outline Background Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin Experiments Conclusion

38 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 38 Our solution We proposed two data streaming schemes: tag+level and prefix path streaming. Basic idea: Separate the elements with the same tag name to different streams Tag+level: elements with the same tag and level are grouped together Prefix path: elements with the same root-to-node path are grouped together

39 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 39 Two Refined Streaming Schemes(1) Tag + Level: elements with the same tag and level are grouped together. Document a1a1 a Level 1: Level2: Level1: 2: 3: a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 d3d3 c2c2 d1d1 c1c1 4: a 2, a 3 b2b2 b Level3: Level2: b1b1 C 1, C 2 c Level4: d Level3: d 1, d 2,d 3

40 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 40 Two Refined Streaming Schemes(2) Prefix Path Streaming (PPS): elements with the same root-to-node path are grouped together. Document a1a1 a Level 1: 2: 3: a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 d3d3 c2c2 d1d1 c1c1 4: a 2, a 3 b2b2 b a/a/b: a/b: b1b1 C1C1 c d d 1, d 2 a: a/a: C2C2 a/a/b/c: a/b/d/c: d3d3 a/a/d: a/b/d:

41 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 41 Two benefits of refined streaming schemes(1) (1) Enlarge the optimal query classes For example, considering the document and query, previous algorithms: TwigStack and TwigStackList will output one useless solution. But based on tag+level, is not output, since we know there is no figure elements in level 2. QueryDocument s1s1 t1t1 s2s2 t2t2 f1f1 figure S1S1 t1t1 S2S2 Level2: Level1: t2t2 f1f1 Level3: Level2: Level 1: 2: 3: Section title figure title Section

42 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 42 Two benefits of refined streaming schemes(2) (2) Skip irrelevant elements For the document and query, since there is no title elements in level 3, we may skip reading all figure elements in level 3. Document : s1s1 f1f1 t1t1 f2f2 fnfn …… Level 1: Level 2: Level 3: Section title figure Query:

43 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 43 Outline Background Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin Experiments Conclusion

44 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 44 A general algorithm: iTwigJoin We propose a general algorithm, called iTwigJoin, which can be used on various data streaming schemes. Our key idea is to classify all current head elements to three classes: Subtree-matching Useless Blocked

45 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 45 Classifying Head Elements Subtree-Matching Element Element e of tag E is called a subtree-matching element for query Q e is in a match to Q E (Q E is the sub-tree of Q rooted at E); and NOT in any future match to Q P where P is the parent of E in Q Useless Element Element e is called a useless element if e is not in any future match to Q E. Blocked Element An element which is neither subtree-matching nor useless

46 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 46 Example 1: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching useless a2a2 blocked : head element a b c d

47 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 47 Example 1: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching useless a2a2 blocked d1d1 : head element a b c d

48 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 48 Example 1: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching useless a2a2 blocked d1,a1,b1,b2,c1d1,a1,b1,b2,c1 : head element a b c d

49 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 49 Example 2: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching useless a 1,a 2 blocked : head element a b c d

50 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 50 Example 2: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching useless a 1,a 2,b 2 blocked : head element a b c d

51 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 51 Example 2: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching d1d1 useless a 1,a 2, b 2 blocked : head element a b c d

52 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 52 Example 2: Classifying Head Elements (Tag+Level) a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 A DB C D: Q1: a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 Subtree- matching d1d1 useless a 1,a 2, b 2 blocked c1,b1c1,b1 : head element a b c d

53 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 53 Classifying Head Elements Useless element can be discarded safely sub-tree Matching element is pushed to the corresponding stack Blocked element causes problem CANNOT be discarded because it may cause loss of results CANNOT be pushed to stack because it may cause useless results When all head elements are blocked; optimal holistic matching CANNOT be guaranteed We push blocked elements into stack, which may result in useless intermediate results in some cases.

54 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 54 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 a b c d

55 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 55 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a

56 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 56 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a Since a2 is a useless element, we discard a2 and scan a3.

57 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 57 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a Now all elements are blocked. We push a1 to stack. a1a1

58 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 58 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 d1d1

59 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 59 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:

60 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 60 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions: Since a3 is a sub-tree matching element, we push a3 to stack. a3a3

61 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 61 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 a3a3 d2d2 Output intermediate path solutions:

62 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 62 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,, a3a3

63 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 63 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,, a3a3 b1b1

64 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 64 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,, a3a3 b1b1 c1c1

65 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 65 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,,, a3a3 b1b1

66 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 66 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 b2b2 Output intermediate path solutions:,,,

67 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 67 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,,, b2b2 c2c2

68 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 68 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,,,, b2b2

69 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 69 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,,,, b2b2 d3d3

70 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 70 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 Output intermediate path solutions:,,, b2b2

71 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 71 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 b2b2 The 1 th final solution: Output intermediate path solutions:,,,

72 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 72 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 a3a3 b2b2 The 2 th final solution: Output intermediate path solutions:,,,

73 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 73 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 a3a3 b2b2 The 3 th final solution: Output intermediate path solutions:,,,

74 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 74 An example of iTwigJoin algorithm Document: Query: A DB C a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 c2c2 d3d3 c1c1 d1d1 1:20,1 2:5,2 3:4,3 6:13,2 7:8,3 9:12,3 10:11,4 14:19,2 15:18,3 16:17,4 a1a1 Level2: Level1: a 2, a 3 b2b2 Level3: Level2: b1b1 C 1, C 2 Level4: Level3: d 1, d 2,d 3 b d c a a1a1 a3a3 b2b2 The 4 th final solution: Output intermediate path solutions:,,,

75 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 75 Optimal classes of iTwigJoin for three streaming schemes Tag StreamingA-D only pattern Optimal classStreaming scheme A-D only

76 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 76 Tag StreamingA-D only pattern Tag+Level StreamingA-D/P-C only pattern Optimal classStreaming scheme A-D/P-C only A-D only Optimal classes of iTwigJoin for three streaming schemes

77 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 77 Tag StreamingA-D only pattern Tag+Level StreamingA-D/P-C only pattern Prefix Path Streaming Optimal classStreaming scheme A-D/P-C only or 1-Branch node A-D/P-C only A-D only A-D/P-C only or 1-Branch Optimal classes of iTwigJoin for three streaming schemes

78 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 78 Tag StreamingA-D only pattern Tag+Level StreamingA-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch Optimal classStreaming scheme A-D/P-C only or 1-Branch node A-D/P-C only A-D only More refined Optimal class:Larger Optimal classes of iTwigJoin for three streaming schemes

79 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 79 Outline Background Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin Experiments Conclusion

80 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 80 Experiments Benchmarks XMark: Synthetic Data Treebank: Real Data from Wall Street Journal

81 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 81 Experiments: I/O Performance Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode By pruning irrelevant streams, PPS usually scan the fewest number of elements.

82 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 82 Experiments: Number of Intermediate Paths Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode 2. For treebank 5, there is no matching results. So Tag+Level and PPS do not output any intermediate results. 1. Tag+level and PPS output less intermediate results than TwigStack and TwigStackList in TreeBank data.

83 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 83 Experiments: Running Time XMark1: Path Pattern, XMark2: A-D only, XMark3: P-C only, XMark4: 1-branchnode, XMark5: Non-optimal, Tag+level and PPS have better performance than TwigStack and TwigStackList in XMark data.

84 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 84 Outline Background Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin Experiments Conclusion

85 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 85 Conclusions We develop a general algorithm to perform holistic twig join on Tag+Level and PPS streaming schemes. We identify two I/O optimal classes for Tag+Level and PPS streaming schemes. Since our experiments show that Tag+Level streaming schemes can guarantee to produce very few useless intermediate results in most cases, we recommend to use Tag+Level scheme for efficient XML twig pattern matching.

86 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing 86 END Thank you! Q & A


Download ppt "On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen."

Similar presentations


Ads by Google