Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.

Similar presentations


Presentation on theme: "A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD."— Presentation transcript:

1 A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD

2 Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

3 Efficient Processing of Sequentially Accessed XML Data XML Message Transformer Transformed XML message Web Service XML message Web Service Implementations & RMI

4 Web Front-End Efficient Processing of Sequentially Accessed XML Data XML-to-XHTML Transformer XML file Web Development XHTML page

5 Efficient Processing of Sequentially Accessed XML Data Archive Transformation & ETL (Extraction Transformation & Loading) Applications XML Processor XML archive file XML target file

6 Efficient Processing of Sequentially Accessed XML Data Sensor Data Processor Stream Acting/ Mining Software XML Sensor Data Analysis

7 Bandwidth & Connectivity will Increase the Amount of Data … XML Sensor Data Processor XML stream XML stream XML stream XML stream XML

8 …Hardware Advances do not Favor Conventional Architectures Magnitude Year CPU Speed CPU2Memory Speed Bandwidth

9 Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

10 Transducer-Based Processing: On-the-Fly & Minimal Memory Condition | Action … … Buffers XML Stream Machine … … Input buffer Output buffer Condition | Action

11 XML Stream Machine (XSM) High-Level Architecture XQuery Compiler XSM-to-C Compiler XSM XQuery C program Optional Input DTD

12 Components of the XQuery Compiler XQuery-to-Network Translation XSM Composition XSM Network Single XSM XQuery Schema Optimization Optional Input DTD

13 Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

14 for-where-return Expressions XQuery Subset Path Expressions Element Construction Concatenation for $X in $R/a return for $Y in $X/b return $Y, $X

15 XML Stream: Tags, Data & Control Tokens … 5 1 XML Stream is Sequence of  Data  Open Tag & Close Tag Tokens  Control Tokens S $R E $R

16 Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

17 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C Concatenation of bindings of Y, X into bindings of Z 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y Input Buffer Y Input Buffer X SzSz Output Buffer Z 5 5 1 EzEz

18 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

19 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

20 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

21 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

22 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5

23 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5

24 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5

25 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5 5 1

26 XML Stream Machine (XSM) 0 1 2 3 *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y z Input Buffer Y Input Buffer X SzSz Output Buffer Z 5 5 1 EzEz

27 Comparison of XSM against State Automata & Transducers State Automata Do not construct Do not store intermediate results Sufficient for XPath only Transducers Finite alphabets State is the only memory No reset of input pointers XSM Unbounded alphabet Buffers Pointer reset

28 Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

29 XSM Networks: Intermediate Step in Translating Queries to XSMs XQuery-to-Network Translation XSM Composition XSM Network Single XSM XQuery

30 XSM Network for $X in $R/a return for $Y in $X/b return $Y, $X $R $R/a $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $Z $O $Y’,$X’ $Z

31 From XQueries to XSM Networks: Non-FLWR Expressions $Y, $X $X $Y $O $Z $Y,$X $X $Y $Z $O

32 From XQueries to XSM Networks: FLWRs without Free Variables for $X in G return expr($X) $X $R G expr($X) $O

33 From XQueries to XSM Networks: FLWRs with Free Variables for $Y in $X/b return $Y, $X free variable $X $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $Y’, $X’ $O

34 Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

35 Composition Merges Two XSMs Into One $R $R/a $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $Z $O $Y’,$X’ $Z

36 Composition Merges Two XSMs into One $R $R/a $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $O $Y’, $X’

37 XSM Composition: “State Product” Emulates Producer-Consumer Producer M 1 Consumer M 2 q1q1 q1q1 q2q2 “State Product” M 3 = (M 2 o M 1 ) q2q2

38 M1M1 M2M2 Naive Composition q1q1 q1’q1’  1 |A 1... q2q2 q2’q2’  2 |A 2... q1q1 q2q2 q1q1 q2’q2’  2 |A 2... q1q1 q2q2 q1’q1’ q2q2 ¬ 1 |A 1... M3M3 = (M 2 o M 1 ) M 2 step if (q 2 ) M 1 step if ¬(q 2 ) (q 2 ) = ¬AE(r 1 ) ...  ¬AE(r n ) = “no shared read-pointer r i of q 2 is At End” r 1... r n

39 Smart Composition Normalization Assumptions: #( read-pointers-into-shared-buffer(q 2 ) )  1 Atomic actions only Basic idea: avoid runtime tests (“At-End”) whenever outcome can be determined at compile- Different “modes”: go: consumer M 2 proceeds (full buffer) no: producer M 1 proceeds (empty buffer) may be consumer can follow immediately ae: do runtime check AE:

40 Smart Composition: no Case (shared buffer is empty) A 1 does not write to the shared buffer M 2 does not wait on shared buffer Transition insertedCase no q’ 1 q2q2  1 |A 1 q1q1 q2q2 no  2 |A 2 no q’ 2 q1q1 q2q2 no M1M1 M2M2 q1q1 q1’q1’  1 |A 1... q2q2 q2’q2’  2 |A 2... q1q1

41 Smart Composition: Producer fills buffer CaseTransition inserted If A 1 writes token to the shared buffer and M 2 consumes token If A 1 writes to the shared buffer, but M 2 doesn’t advance its read pointer no q’ 1 q’ 2  12 |A 12 q1q1 q2q2 no goq’ 1 q’ 2  12 |A 12 q1q1 q2q2 no Combination of A 1 with A 2 Combination of  1 with  2

42 Smart Composition: go - ae - no no q1q1 q’ 2 goq1q1 q’ 2 goq1q1 q2q2  2 |A 2 if A 2 advances the read pointer into shared buffer in go mode if A 2 does not advance read pointer into shared buffer goq1q1 q2q2

43 Smart Composition: go - ae - no in ae mode: insert transitions for M 2 step if possible... If ø 2 ; A 2 has no read from the shared buffer if ø 2 ; A 2 has a read from the shared buffer aeq1q1 q’ 2  2 |A 2 q1q1 q2q2 ae q1q1 q’ 2 ¬ AE(r) 2 |A 2 q1q1 q2q2 ae

44 Smart Composition: go - ae - no q’ 1 q2q2 ae q’ 1 q2q2 AE(r) 1 |A 1 if A 1 has one write into the shared buffer AND transitions corresponding to M 1 step... if A 1 has more than one write into the shared buffer q1q1 q2q2 ae noq’ 1 q2q2 AE(r) 1 |A 1 if A 1 has no write into the shared buffer q1q1 q2q2 ae q1q1 q2q2 go

45 Performance Datapoint (Transformation Query on DBLP) Data Size (KB) Xalan (ms) XSM Java XSM C 466326630 500070312360312 2000010271082661156 80000320784640

46 Conclusions & Future Work Novel query processor model Success in filtering & transformation To be extended for joins & aggregations Memory footprint questions Facilitated by model’s simplicity

47 Related Work Relational Data Streams & Sequence Data Models Pipelined Join Operators Aggregates & Approximations Fast XPath on streams Memory requirements of validating XML

48 Smart Composition: go - ae - no aeq’ 1 q2q2  1 |A 1 if A 1 does not advance shared write pointer in no mode: execute M 1 step... if A 1 does advance shared write pointer q1q1 q2q2 no if A 2 advances shared read pointer if A 2 does not advance shared read pointer goq’ 1 q’ 2  12 |A 12 q1q1 q2q2 no... AND possibly M 2 step simplified composed  1  2 and (A 1 ;A 2 ) no q’ 1 q2q2  1 |A 1 q1q1 q2q2 no q’ 1 q’ 2  12 |A 12 q1q1 q2q2 no


Download ppt "A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD."

Similar presentations


Ads by Google