Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Similar presentations


Presentation on theme: "1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis."— Presentation transcript:

1 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2 2 Outline of this talk Motivation Motivation TreeSketch Approach TreeSketch Approach Experimental Results Experimental Results Contributions and Limitations Contributions and Limitations

3 3 Outline  Motivation TreeSketch Approach TreeSketch Approach Experimental Results Experimental Results Contributions and Limitations Contributions and Limitations

4 4 Motivations XML de-facto standard for data exchange XML de-facto standard for data exchange Need to explore large XML data sets and get fast feedback from complex XML queries Need to explore large XML data sets and get fast feedback from complex XML queries Conflict between fast ‘on-line’ response and query execution cost Conflict between fast ‘on-line’ response and query execution cost --Need fast feedback

5 5 XML Query Challenges Involve complex traversals of the XML data hierarchy Involve complex traversals of the XML data hierarchy Complex queries over massive tree-structured data--very expensive Complex queries over massive tree-structured data--very expensive Approaches: Optimize the query or optimize the data structure Approaches: Optimize the query or optimize the data structure No need for accurate results, we can instead return approximate query answers No need for accurate results, we can instead return approximate query answers

6 6 Approximate Query answers Obtain an approximation to the true result Obtain an approximation to the true result Currently employed in relational systems successfully Currently employed in relational systems successfully Use approximate result to get timely feedback Use approximate result to get timely feedback XML Data. Synopsis XML R R’ Query

7 7 Outline Motivation Motivation  TreeSketch Approach Experimental Results Experimental Results Contributions and Limitations Contributions and Limitations --A technique being used to return fast, approximate results

8 8 Data and Query Model a: author n: name b: book p: paper y: year y: year k: keyword k: keyword t: title t: title --Some background, XML document

9 9 Data and Query Process --Twig Query, Query Tree, and Nested Result Tree

10 10 Basic Query Scenario a nk k d0Approximate Nesting Tree True XML Data Synopsis Key idea is to return fast, accurate feedback Key idea is to return fast, accurate feedback

11 11 Approximate Query Answers How to construct concise XML synopses, which capture the statistical traits of the true data How to construct concise XML synopses, which capture the statistical traits of the true data How to produce approximate query answers over the synopsis efficiently How to produce approximate query answers over the synopsis efficiently -- Two key problems

12 12 TreeSketch Construction Step 1: Step 1: Given an XML tree T, build a graph synopsis: each node represents a set of same tag elements, large tree Given an XML tree T, build a graph synopsis: each node represents a set of same tag elements, large tree Step2: Step2: Compress synopsis by merging nodes with similar sub- structures (i.e. clustering of the XML elements) Compress synopsis by merging nodes with similar sub- structures (i.e. clustering of the XML elements) Step 3 Step 3 Repeat Step 2 until the predefined space budget constraint is met Repeat Step 2 until the predefined space budget constraint is met Step 4 Step 4 Return the TreeSketch Synopsis Return the TreeSketch Synopsis … Perfect Space Budget --Construction Algorithm

13 13 More Discussions Graph synopsis construction Graph synopsis construction Use node to represent a set of same tag elements Use node to represent a set of same tag elements Query can be retrieved with zero-error Query can be retrieved with zero-error The size can become very large-it can easily be in the order of the original document size The size can become very large-it can easily be in the order of the original document size TreeSketch synopsis construction TreeSketch synopsis construction Compress the synopsis by merging nodes Compress the synopsis by merging nodes Bottom-up merging clustering algorithm Bottom-up merging clustering algorithm Key technique to compress  Clustering Key technique to compress  Clustering Based on structure Based on structure Model accuracy depends on quality of clustering Model accuracy depends on quality of clustering Tight clusters  Accurate synopsis, but large model Tight clusters  Accurate synopsis, but large model Loose clusters  Less accuracy, but small model Loose clusters  Less accuracy, but small model --of the construction procedure

14 14 Construction Example XML Document (Graph Synopsis) P(1) S(2) F(2) C(4) F(2) E(2) R(1) p1p1 s2s2 f5f5 c 11 s3s3 f6f6 c 12 f4f4 e8e8 c9c9 e 10 f7f7 c 13 r Synopsis node  Set of elements of the same tag Synopsis node  Set of elements of the same tag Synopsis edge  Document edge(s) Synopsis edge  Document edge(s) --Count same tag elements

15 15 Construction Example Calculate the number of children for each edge Calculate the number of children for each edge Count [r, p]: mean #children in p per element in r Count [r, p]: mean #children in p per element in r 1 2 = 2 / 1 1 1 1 1 1 P(1) S(2) F(2) C(4) F(2) E(2) R(1) --Calculate number of children per element P(1) S(2) F(2) C(4) F(2) E(2) R(1)

16 16 Merging Nodes 1 2 2 1 0.5 P(1) S(2) C(4) F(4) E(2) R(1) --Less space budget 1 2 1 1 1 1 1 P(1) S(2) F(2) C(4) F(2) E(2) R(1) More Concise Synopsis TreeSkech synopsis

17 17 Compute Approximate Answers --more like the traditional way Travel down the tree Travel down the tree Match a pattern in the structure and return a sub-tree Match a pattern in the structure and return a sub-tree TreeSketch: Fast response TreeSketch: Fast response Concise synopsis Concise synopsis Keep statistical information Keep statistical information Node: number of same tag elements Node: number of same tag elements Edge: number of children per element Edge: number of children per element

18 18 Compute Approximate Answers TreeSketch q0q0 q1q1 q2q2 q3q3 //section.//equation.//caption Query Approximate Nesting Tree R E1x1=1 1x1+1x1=2C S 1x2 = 2 1 2 1 1 1 1 1 P(1) S(2) F(2) C(4) F(2) E(2) R(1) Approximate results with structure 1) Take advantage of the concise structure 1) Take advantage of the concise structure 2) and the statistical data 2) and the statistical data --Example

19 19 Outline Motivation Motivation TreeSketch Approach TreeSketch Approach  Experimental Results Contributions and Limitations Contributions and Limitations

20 20 Experimental Setup Focus on Focus on the quality of the approximate answers generated the quality of the approximate answers generated the efficiency of the construction process the efficiency of the construction process Data Set Data Set Data Sets: XMark, DBLP, IMDB, SwissProt Data Sets: XMark, DBLP, IMDB, SwissProt Workload: 1000 random twig queries Workload: 1000 random twig queries

21 21 Evaluation Methods Error  Distance between R’ and R Error  Distance between R’ and R Popular metric: Tree-edit distance Popular metric: Tree-edit distance Min-cost sequence of operations that transform R’ to R Min-cost sequence of operations that transform R’ to R Argument: not capture the structure similarity Argument: not capture the structure similarity New Evaluation metrics : ESD (Element Simulation Distance) New Evaluation metrics : ESD (Element Simulation Distance) Calculate the number of children for each edge in the tree to capture the complete structure of the tree Calculate the number of children for each edge in the tree to capture the complete structure of the tree model how well the structure of two trees match from each other model how well the structure of two trees match from each other “degree” of simulation between two trees “degree” of simulation between two trees Average ESD for evaluation Average ESD for evaluation

22 22 Experimental Results --Approximate answers, compared with TwigXsketches

23 23 Experimental Results --Relative Errors < 5% i.e. 95% accuracy

24 24 Outline Motivation Motivation TreeSketch Approach TreeSketch Approach Experimental Results Experimental Results  Contributions and Limitations -Strengths and Weaknesses

25 25 TreeSketch Approach Propose an effective XML-summarization mechanism Propose an effective XML-summarization mechanism Captures the complete tree structure of large XML data Captures the complete tree structure of large XML data Experimental results: produce fast and accurate approximate query answers Experimental results: produce fast and accurate approximate query answers Author claim: The first work to address the timely problem of producing approximate tree-structured answers for complex XML queries Author claim: The first work to address the timely problem of producing approximate tree-structured answers for complex XML queries Comparison with the related work: 2 options Comparison with the related work: 2 options Either compute the exact answer to a path query: expensive Either compute the exact answer to a path query: expensive Or use an approach such as twig-XSketch, which does not capture the complete tree structure of the underlying XML database Or use an approach such as twig-XSketch, which does not capture the complete tree structure of the underlying XML database -In this paper

26 26 Limitations Difficult to optimize some pre-defined parameters, such as the space budget Difficult to optimize some pre-defined parameters, such as the space budget which directly related to the accuracy of the approximate query answers which directly related to the accuracy of the approximate query answers too large  affect the efficiency, too small  quality of the answers; depends on the query, data set, and the computing resources too large  affect the efficiency, too small  quality of the answers; depends on the query, data set, and the computing resources Expecting incremental model construction process Expecting incremental model construction process XML data always increase incrementally, we need to construct the synopsis model incrementally XML data always increase incrementally, we need to construct the synopsis model incrementally More experiments or some real applications are needed to justify the scalability of this technique More experiments or some real applications are needed to justify the scalability of this technique -Nice research, Next steps for further investigation

27 27 Thank You / Merci


Download ppt "1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis."

Similar presentations


Ads by Google