Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bose, Fegaras, Levine, Chaluvadi DBPL 20031 A Query Algebra for Fragmented XML Stream Data Sujoe Bose Leonidas Fegaras.

Similar presentations


Presentation on theme: "Bose, Fegaras, Levine, Chaluvadi DBPL 20031 A Query Algebra for Fragmented XML Stream Data Sujoe Bose Leonidas Fegaras."— Presentation transcript:

1 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20031 A Query Algebra for Fragmented XML Stream Data Sujoe Bose Leonidas Fegaras David Levine Vamsi Chaluvadi University of Texas at Arlington

2 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20032 Processing Streamed XML Data Most web servers are pull-based: A client submits a request, the server returns the requested data. This doesn’t scale very well for large number of clients and large query results. Alternative method: pushed-based dissemination –The server broadcasts/multicasts data in a continuous stream –The client connects to multiple streams and evaluates queries locally –No handshaking, no error-correction –All processing is done at the client side –The only task performed by the server is slicing, scheduling, and broadcasting data: Critical data may be repeated more often than no-critical data Invalid data may be revoked New updates may be broadcast as soon as they become available.

3 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20033 A Framework for Processing XML Streams The server slices an XML data source into XML fragments. Each fragment: –is a filler that fills a hole –may contain holes which can be filled by other fragments –is wrapped with control information, such as its unique hole ID, the path that reaches this fragment, etc. The client opens connections to streams and evaluates XQueries against these streams –For large streams, it’s a bad idea to reconstruct the streamed data in client’s memory need to process fragments as soon they become available from the server –There are blocking operators that require unbounded memory: Sorting Joins between two streams or self-joins Group-by with aggregation.

4 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20034 The Fragmented Hole-Filler Model Wal-Mart...... PDA HP PalmPilot 315.25 Calculator Casio FX-100 50.25

5 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20035 An Algebra for Stored XML Data Based on the nested-relational algebra:  v (T)access the XML data source T using v  pred (X)select fragments from X that satisfy pred  v1,….,vn (X)project X  Ymerge X pred Yjoin  pred v,path (X) unnest (retrieve descendents of elements)  pred ,h (X) apply h and reduce by   gs,pred v, ,h (X)group-by gs, apply h to each group, and reduce each group by 

6 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20036 Semantics  v (T)= { }  pred (X)= { t | t  X, pred(t) }  v1,….,vn (X)= { | t  X } X  Y= X ++ Y X pred Y= { t x  t y | t x  X, t y  Y, pred(t x,t y ) }  pred v,path (X)= { t  | t  X, w  PATH(t,path), pred(t,w) }  pred ,h (X) =  /{ h(t) | t  X, pred(t) }  gs,pred v, ,h (X) = …

7 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20037 Example #1 where

8 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20038 Example #1 (cont.)  ,element(“book”,$b/title)   $v/bib/book $b $v document(“http://www.bn.com”)  $b/publisher=“Addison-Wesley” and $b/@year > 1991

9 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 20039 Example #2 for $u in document(“users.xml”)//user_tuple return { $u/name } { for $b in document(“bids.xml”)//bid_tuple[userid=$u/userid]/itemno $i in document(“items.xml”)//item_tuple[itemno=$b] return { $i/description/text() } sortby(.) } sortby(name)     document(“users.xml”) $us $us/users/user_tuple   document(“bids.xml”) $bs $bs/bids/bid_tuple   document(“items.xml”) $is $is/items/item_tuple $u $i $b  $c/itemno $c/userid=$u/userid $c $i/itemno=$b sort, elem(“bid”,$i/description/text()) sort($u/name), elem(“user”,$u/name++  )

10 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200310 XPath Expressions Path evaluation is central to the algebra: PATH: ( XML-data, simple-XPath )  set(XML-data) Some rules for stored XML data: PATH( x,A/path) = PATH(x,path) PATH( x,A) = { x } PATH(x 1 x 2,path) = PATH(x 1,path)  PATH(x 2,path) PATH(x,path) =  otherwise Predicates have existential semantics $v/A/B = “text”   x  PATH(v,A/B): x = “text”

11 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200311 The Streamed XML Algebra Much like the stored XML algebra, but works on streams. A stream  takes the forms: t ;  ’ a fragment t followed by the rest of the stream  ’ eosend-of-stream Each stored XML algebraic operator has a streamed counterpart eg,  pred (t ;  ) = t ;  pred (  )if pred is true for t  pred (t ;  ) =  pred (  )otherwise  pred (eos) = eos but … we may not be able to validate pred due to holes in t

12 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200312 Streamed Algebra Semantics To keep the suspended fragments, each streamed algebraic operator has –one state  0 for the output and –optional state(s)  1 /  2 for the input(s) The result of PATH may now be unspecified: PATH(,path) = PATH(  1 (m),path)if m   1 = {  }otherwise When in predicates,  requires 3-value logic Incomplete fragments are suspended when necessary, eg:  pred (t ;  ) = t ;  pred (  )if true  PATH(t,pred)  pred (t ;  ) =  pred (  )otherwise  0   0  {t}if  PATH(t,pred)

13 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200313 Join Much like main-memory symmetric join states: –  0 all suspended output tuples due to unfilled holes –  1 all tuples from left stream –  2 all tuples from right stream a tuple from left stream: (t 1 ;  1 ) pred  2 = { t 1  t 2 | t 2  2, true  PATH(t 1  t 2,pred) }; (  1 pred  2 )  1   1  t 1  0   0  { t 1  t 2 | t 2  2,  PATH(t 1  t 2,pred) } a tuple from right stream:  1 pred (t 2 ;  2 ) = { t 1  t 2 | t 1  1, true  PATH(t 1  t 2,pred) }; (  1 pred  2 )  2   2  t 2  0   0  { t 1  t 2 | t 1  1,  PATH(t 1  t 2,pred) }

14 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200314 Reconstructing the XML Data  : set(int  XML-data) is an environment that binds filler ids to XML. x   replaces holes with fillers in x using the environment  : x   = x   (x 1 x 2 )   = (x 1   ) (x 2   )   =  [m]if m  x   = xotherwise R(  ) returns a pair (a,  ), where and a is  [0] (the reconstructed data): if R(  ) = (a,  ) then R( ;  ) = R(eos) = ( ,  ) Basically, R(t ;  ) = f(R(  )) (x  ,  ) if m=0 (a   ’,  ’) if m  0 where  ’={(m,x   )}   [m/x] {

15 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200315 Equivalence Between Stored & Streamed Algebras If we reconstruct the XML document from the streamed fragments and evaluate a query using the stored algebra, we get the same result as when we use the equivalent streamed algebra over the streamed XML fragments and reconstruct the result. XML document XML fragments result reconstruction stored XML algebra streamed XML algebra XML fragments reconstruction Proof sketch: We prove R(  p (  ))=  p (R(  )) inductively, where  p is the stream version of  p. If true  PATH(t,pred), then R(  p (t;  ))=R(t;  p (  ))=f(R(  p (  )))=f(  p (R(  ))) =  p (f(R(  ))) =  p (R(t;  )) …

16 http://lambda.uta.edu/XStreamCast/ Bose, Fegaras, Levine, Chaluvadi DBPL 200316 Conclusion Fragmented XML data are easier to handle and synchronize than an infinitely long stream Associating holes with fillers takes care of out-of-sequence transmission, repetitions, replacements, and removals Our streamed algebra has similar operators but different semantics than our stored algebra Our algebra can capture most non-recursive XQueries Our future work includes –the development of main-memory algorithms for processing XML data streams under memory and power constraints –The development of a comprehensive approach to optimizing XQueries that utilizes our main-memory algorithms.


Download ppt "Bose, Fegaras, Levine, Chaluvadi DBPL 20031 A Query Algebra for Fragmented XML Stream Data Sujoe Bose Leonidas Fegaras."

Similar presentations


Ads by Google