Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou.

Similar presentations


Presentation on theme: "CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou."— Presentation transcript:

1 CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou

2 2 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation

3 3 It starts with … “Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, … Then the problem comes up… “The applications uses information assets widely distributed across my enterprise” If only…. “Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded” Data Integration Requirements in eBusiness Applications

4 4 (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … John 56 Chicago George 58 Chicago … View-Based Approach: Wrappers Export Basic Source Views

5 5 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Wrappers Export Basic Source Views (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper

6 6 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Mediators Export Integrated Views, Tailored to Application Needs (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer …

7 7 Mediator Wrapper Orders Database Customers Database Find all Chicago customer names, along with their ordered items Retrieve Chicago customer names and id’s Retrieve all cid’s and item names of orders Application Virtual Views: Query-Driven Mediator Operation

8 8 Mediator Wrapper Orders Database Customers Database Application customer name John id 56 … order cid 56 item chips order cid 56 item salsa … customers customer name John ordered_items item chips item salsa customer … On-Demand (Query-Driven) Mediator Operation

9 9 Multiple Plans are Possible Retrieve customers For each customer find matching orders

10 10 Build and Run “Optimal” Plan –Consisting of operators that –Collect source info using supported queries and commands –Combine info into XML result A New Kind of Query Processing Problem

11 11 Operate within the Limited and Different Capabilities of the Sources –Describe sets of supported queries –Use most efficient supported queries Optimize plans/queries sent to sources –Estimate Costs of Plans –Adapt Plans Along the Way –Beyond Conjunctive Queries –Compose Queries/Views Efficiently Schema inference & optimization Combine navigation & querying Challenges in Query Processing & Optimization

12 12 Queries supported by mediator Answering Queries Using Views But with Infinite Sets of Views Increasing Relevance due to Web Services Source Data & Schema all queries over schema Queries supported by wrapper Source Data & Schema From Limited Wrappers to Efficient Plans for Extended Query Sets

13 13 Operate within the Limited and Different Capabilities of the Sources –Describe sets of supported queries –Use most efficient supported queries Optimize plans/queries sent to sources –Estimate Costs of Plans –Adapt Plans Along the Way –Beyond Conjunctive Queries –XQuery processing Schema inference & optimization Combine navigation & querying –Build iterator models for low memory footprint Challenges in Query Processing & Optimization

14 14 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Navigation-Driven Evaluation of Query Result

15 15... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations down(p) right(p) p Client Navigation-Driven Evaluation

16 16... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

17 17... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

18 18... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

19 19... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

20 20 Mixing Querying & Navigation customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Find details of all salsa orders below visited node

21 21 Two-dimensional navigation –Reminds of cursors but there are multiple continuation points Controlling size + shape Contextualizing queries by navigation Challenges in Mixing Querying & Navigation

22 22 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation

23 23 Translation to Algebra Rewriter/Optimizer Algebra Plan Physical Algebra Plan Queries & Fetch Requests to Sources Source Description Function Description Functions Source Schemas & Types Navigation Requests Results Client Plan Execution Engine An Algebra-Based Query Processor Architecture XQuery Views XQuery

24 24 Well-known efficient physical implementations of the operators Join optimization Nested data by nested plans or group-by Efficient iterator model Query Processing on Tuple-Oriented Algebra Enables…

25 25 XQuery: Queries & Views for XML { for $cust in document(“db”)/customer return { $cust/id, for $order in document(“db”)/order where $order/cid = $cust/id return { $order/id } } }

26 26 Access and Navigation customer_table customer name John id 56 customer name George id 58 source db, [$db1] db $db1 ct getD $db1, customer  $cust c1c1 c2c2 $db1 $cust ct c 1 ct c 2 getD $cust, id  $cust_id $db1 $cust $cust_id ct c 1 i 1 ct c 2 i 2 i1i1 i2i2

27 27 customer_table customer name John id 56 customer name George id 58 source db, [$db1] db $db1 ct getD $db1, customer/id  $cust_id $db1 $cust_id ct i 1 ct i 2 i1i1 i2i2 Since $cust_id  $cust and $cust is “useless” otherwise Simplification Using Schema Inference

28 28 Nested Plans for $part $db1 $cust_id ct i 1 ct i 2 $db1 $cust_id $part ct i 1 ct i 2 $db1 $cust_id ct i 1 $db1 $cust_id ct i 2 apply $part, p  $orders nestedSrc $part $db1 $cust_id ct i 1 … Plan p $db1 $cust_id $orders ct i 1 [o 11 …] $db1 $cust_id ct i 2 ct i 2 [o 21 …]

29 29 Joins and Selections nestedSrc $part $db1 $cust_id ct i 1 getD $db2, order  $order source db, [$db2] getD $order, cid  $cust_id2 getD $order, id  $order_id $db2 $order $cust_id2 $order_id …  $cust_id2=? $cust_id $db1 $cust_id $db2 $order $cust_id2 $order_id …

30 30 Constructors crList $order_id  $oidL … $order_id … o 1 … o 2 o1o1 o2o2 … $order_id $oidL … o 1 [o 1 ] … o 2 [o 2 ] crEl order, $oidL  $oidE order … $oidL $oidE … [o 1 ] e 1 … [o 2 ] e 2 e1e1 e2e2 listify $oidE  $orders $orders [e 1, e 2 ]

31 31 Algebra Example

32 32 Plan Decomposition Within Rewriting Optimizer Rules replacing “leaf” trees May move commutable parts Catch: No projection limitation

33 33 Plan After Decomposition

34 34 p2p2 p2p2 p1p1 p1p1 for $part apply $part, p  $R nestedSrc $part p3p3 p3p3 p1p1 p1p1 p2p2 p2p2 groupBy S(p1)  $part apply $part, p  $R nestedSrc $part p3p3 p3p3 Replacing Nested Plans with GroupBy/Outerjoin Combinations

35 35 Multiple Possible Plans

36 36 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation

37 37 Source access Source access Source Client Building Navigation-Driven Evaluation on the Algebra

38 38 customer_table customer name John id 56 customer name George id 58 c1c1 c2c2 $db1 $cust ct c 1 ct c 2 getD $cust, id  $cust_id $db1 $cust $cust_id ct c 1 i 1 ct c 2 i 2 i1i1 i2i2 root tuple $db1 $cust $cust_id tuple $db1 $cust $cust_id Think of Each Operator as a Lazy Mediator

39 39... s1 sn Result of Operator below result Lazy Operator Input: client navigations Output: source navigations Result of Operator below Augmented with nextTuple(p) p.attr Navigation-Driven Evaluation of Operators

40 40 r/d( ) Operator State V 1 : V 2 : … V n : Other: … f1f2…fnf1f2…fn Proceed down/right Operator State V 1 : V 2 : … V n : Other: … f’ 1 f’ 2 … f’ n Use of Semantic Id’s in Navigation- Driven Evaluation

41 41 lineitem Hole 1 order customer root name, “John” oid, 123 Hole 3 Fragments Reduce the “Set State” – “Produce State” Overhead Hole 2

42 42 lineitem order ordnum=16 Hole 4 Hole 5 lineitem Hole 1 order customer root name, “John” oid, 123 Hole 3 Fragments Reduce the “Set State” – “Produce State” Overhead

43 43 Source access Source access listify Source Client Client-Server Interaction Controller Controlling the Size and Shape of Fragments

44 44  Fragment Size causes  Memory Footprint causes  Performance

45 45 Fragmentation Strategies Fixed Fragment Size –Ideal for depth-first, left-to-right navigation Adaptive Fragment Size –Assign larger pieces to those who use them

46 46 Depth First traversal Breadth First traversal Response Performance for Breadth-First and Depth-First

47 47 References Navigation-Driven Evaluation of Virtual Mediated Views –Bertram Ludäscher, Yannis Papakonstantinou, Pavel Velikhov –EDBT 2000 Architecture and Implementation of an XQuery- based Information Integration Platform –Yannis Papakonstantinou, Vasilis Vassalos –IEEE Data Eng. Bull. 25(1), 2002 XML queries and algebra in the Enosys integration platform –Yannis Papakonstantinou, Vinayak R. Borkar, Maxim Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis Vassalos, Pavel Velikhov –Data Knowl. Eng. 44(3), 2003


Download ppt "CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou."

Similar presentations


Ads by Google