Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.

Similar presentations


Presentation on theme: "Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia."— Presentation transcript:

1 Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia

2 Introduction Consumer 1Consumer 2Consumer 3 XPath query 1 XPath query 2 XPath query 3 XML Stream Router XML data stream

3 Related Work The problem was introduced in [Altinel and Franklin 2000] for a system XFilter. [Chan et al. 2002] describes techniques to solve the problem based on a trie (XTrie) [Diao et al. 2003] discusses a method based on optimized NFAs(YFilter) [Green et al. 2003] introduces how to solve the problem using lazy DFA

4 DFA approach in general Convert the set of XPath expressions into the set of NFA’s Convert the set of NFA’s into a single NFA Convert the single NFA into a DFA Process XML data stream with DFA (using SAX model)

5 DFA approach in general (cont) Linear XPath expression: P::= /N | //N | PP N::= E | A | * | text() | text() = S where E – element label A – attribute label / - child axis // - descendant axis * - wild card S – constant string What about predicates? To be decomposed into linear XPath expressions

6 DFA approach in general (cont) Consider two XPath expressions /datasets/dataset[//tableHead//*/text()=“Galaxy”]/title /datasets/dataset[/history]/tableHead[/field] Corresponding query tree $D IN $R/datasets/dataset $H IN $D/history $T IN $D/title sax f = true $TH IN $D/tableHead sax f = true $N IN $D//tableHead//* $F IN $TH/field $V IN $N/text()="Galaxy"

7 Conversion of XPath expressions into NFA and DFA $X IN $R/a $Y IN $X//*/b $Z IN $X/b/* $U IN $Z/d Query treeQuery NFAQuery DFA

8 Eager DFA vs. Lazy DFA DFA is eager if it is obtained by the standard algorithm of conversion of NFA to DFA [Hopcroft and Ullman 1979] DFA is lazy if it is constructed at run-time on demand. Initially it has a single state and whenever we attempt to make a transition into a missing state we compute it and update a transition.

9 Eager DFA P = p 0 // p 1 //… // p k p i = N 1 / N 2 /… / N ni k = # of //’s n i =length of p i, i=0,…,k m=max # of *’s in each p i n=length (or depth) of P, i.e. s=alphabet size |  | Theorem. Given a linear XPath expression P, define prefix(P) = n 0, and body(P) = when k>0, and body(P) = 1 when k = 0. Then eager DFA for P has at most prefix(P) + body(P) states. In particular, if m = 0 and k  1, then DFA has at most (n+1) states.

10 Lazy DFA. Example 2 3 6 4 7 8 5 1 a * * b d b b * b * b * * b * b d DFA Queries \a\\*\b \a\b\*\d Sample XML document

11 Lazy DFA Graph schema (based on DTD) d – the maximum number of simple cycles that a simple path can intersect D – the total number of nonempty, simple paths starting at the root d = 2, D = 13

12 Lazy DFA (cont) Theorem. Consider a graph schema with d, D, and let Q be set of XPath expressions of maximum depth n. Then on any XML input satisfying the schema, the lazy DFA has at most 1 + D(1+n) d states Corollary. The number of states of lazy DFA does not depend on the number of XPath expressions, only on their depth. If n = 10, and the number of XPath expressions is equal to 100,000.  Eager DFA may have  2 100,000 states  Lazy DFA will have  1574 states

13 Lazy DFA. Implementation To process XML stream, it uses SAX model The subset of XPath considered in the implementation  No text() and attribute values tests  Only child and descendant axes  All predicates of a query must fire before the target element

14 Restrictions of the implementation 367-203 MEDIA WORKSHOP U Se 101 T 1:30pm 5:20pm 1-3 \\courses[level]\section \\courses[days]\section \\courses[credits]\section XPath queriesSample XML document 1. All predicates fire before the target element 2. Predicates fire between the starting and closing tags of the target element 3. Predicates fire after the target element

15 Processing attributes When processing a stream, all attributes are converted into elements <section name=“Se 101“ description=“”/> <hours start="1:30pm“ end="5:20pm"/> Se 101 1:30pm 5:20pm

16 Testing Reference implementation: Galax 1.0.3.5 Testing XML stream: World geographic database http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml(1MB) http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml Maximum XML depth of the stream was 6 Number of queries was 14 The depth of queries had a range of 1 to 5 The number of predicates had a range of 0 to 3 The depth of predicates had a range of 1 to 4 Method usedNumber of states used NFA22 Eager DFA87 Lazy DFA22

17 Reference Todd J. Green et al, Processing XML Streams with Deterministic Automata and Stream Indexes,, ACM Transactions on Computational Logic, 12/2004 Altinel, M. and Franklin, M. 2000. Efficient filtering of XML documents for selective dissemination, In Proceedings of VLDB. Cairo Chen J et al, 2000, NiagaraCQ: a scalable continuous query system for internet databases. In Proceedings of the ACM/SIGMOD Conference on Management of Data Diao, Y. and Franklin, M. 2003. Query processing for high-volume XML message brokering. In Proceedings of VLDB. Berlin, Germany. John E. Hopcroft, Jeffrey D. Ullman 1987, Introduction to automata theory, languages, and computation


Download ppt "Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia."

Similar presentations


Ads by Google