Presentation is loading. Please wait.

Presentation is loading. Please wait.

XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey.

Similar presentations


Presentation on theme: "XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey."— Presentation transcript:

1 XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, Ronald Parr Speaker: Ho Wai Shing

2 Contents Introduction: the problems in XML path selectivity estimation XPathLearner: the properties and the details Experiment Results Conclusions Future Work

3 Introduction XML is becoming the standard of data exchange We need to query the structure and text data of XML documents Selectivity is essential in optimizing evaluation plans

4 Introduction Example:

5 Introduction Example: FOR $b IN document("*")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title The path expressions: //book/publisher = "Morgan Kaufmann" //book/year = "1998" //book/title

6 Introduction We need a structure to store some statistics of the data Then calculate the estimated selectivity from these statistics Problem: estimate the selectivity of (simple, single-value, multi-value) path expressions with limited space

7 Related Work Path Trees Markov Tables k-RO (in Lore)

8 Path Trees Aggregate siblings with the same tag tag names only (no data values) e.g.,

9 Markov Table selectivity of short paths up to length k is stored selectivity of longer paths are estimated using a Markov model e.g.,

10 k-RO used in Lore systems very similar to Markov table data values are also objects stored as a graph

11 Twigs can answer "twig" queries a structural query with a small branch based on suffix tree (for simple paths) + signatures in each node (for estimating branching)

12 Problems Faced Offline need to scan the whole repository beforehand to gather statistics unfeasible if the data is remote and is extremely large Can solve SPEs only or it's too large Ignore data values

13 Problems Faced Not Adaptive to query workload much space wasted in infrequently asked paths No Quick Update needs periodic rescan of repository

14 Objective XPathLearner: uses Markov based approach, uses an online algorithm, is adaptive to workload, can answer simple paths, single-value paths (//A/B='3') and multi-value paths (//A='2'/B='3'). considers data values, can be easily updated

15 XPathLearner

16 Architecture

17 A More Detailed Example

18 What to Store? Markov table (1 st order in the discussion)

19 What to Store? may be large if there are many data values solution: only "tag-tag", "tag", and top- k value entries are stored exactly, other entries are stored within buckets default is 1

20 What is Actually Stored? Compressed 1st order Markov table (or, Markov histogram) assumption: v1-v4 starts with 'a', v5-v8 starts with 'b', k = 1

21 Use this formula  : selectivity t 1, t 2,..., t n : tags t 1 t 2...t n : path with these tags N: total number of data items How to Retrieve Selectivity?

22 Use this formula (it's what we calculate)  : selectivity t 1, t 2,..., t n : tags t 1 t 2...t n : path with these tags f(p): frequency of the path p How to Retrieve Selectivity?

23 Use this formula (if it's multi-valued)  : selectivity t 1, t 2,..., t n : tags t 1 t 2...t n : path with these tags f(t,v): frequency of the value v in tag t How to Retrieve Selectivity?

24 Retrieval Example for path //B/C/D, estimated selectivity = for path //B/C/D=v3, estimated selectivity = =

25 How to Update? get the query feedback, e.g., (BCD, 5) update the histogram entries that contained in the query so that the future estimation could be more accurate e.g., update B, C, D, BC, BD so that the estimation is nearer to 5 than before. two update approaches: the Heavy-tail Rule, the Delta Rule

26 Heavy Tail Rule put more correction towards the end (tail) of the path equation: f k () refers to the frequency before update f k+1 () refers to the frequency after update suggestion: w i = 2 i

27 Heavy Tail Rule updating those one-'tag' entries safeguards the terms that were set by exact query feedback

28 Heavy Tail Rule A reminder to what is stored

29 Heavy Tail Rule Example: query feedback = (ACD, 6) by the table, estimation = f(AC) / f(C) x f(CD) = 3 / 7 x 6  3

30 Heavy Tail Rule updates: new estimation = 4 / 8 x 8 = 4

31 Delta Rule first proposed by Rumelhart et al. basic idea: where

32 Experiments

33 Data Set: DBLP (other experiments are done but not included in the paper) Metric: average absolute error, average relative error

34 Experiments

35

36

37

38

39 Conclusions XPathLearner is a new method for estimating the selectivity of path expressions It is online, based on query feedback and doesn't need database scan use Markov histograms to store statistics

40 Future Work change from fixed length Markov table to variable length Markov table choose the paths to be stored more carefully or wisely apply the update method to other areas, e.g., graph based structures, to answer branching queries, etc

41 References [1]Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, Ronald Parr, XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation, VLDB'02


Download ppt "XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey."

Similar presentations


Ads by Google