Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 24 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.

Similar presentations


Presentation on theme: "Lecture 24 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April."— Presentation transcript:

1 Lecture 24 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April 15, 2003

2 Lecture 24 04-15-03 2 XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued in my next lecture…

3 Lecture 24 04-15-03 3 XML vs. SQL for Sensor Databases IrisNet represents data in XML (semi-structured model) Hierarchical documents, Queries in XPATH TinyDB represents data in the relational model Tables, Queries in SQL What are the pros and cons for each approach? How does it depend on the sensing context?

4 Lecture 24 04-15-03 4 Why IrisNet Uses XML Rich, heterogeneous data Hard to capture in a rigid data model Self-describing tags useful Schema evolution XML supports on-the-fly schema changes Wide area sensing => Hierarchical organization Good match for XML, bad for relational Standard data exchange format

5 Lecture 24 04-15-03 5 Disadvantages of XML Query languages are lacking Some minimal features: e.g., aggregates, updates Query processors not available for XQuery Query processing is SLOW Key research question: Can we store XML in a relational DB, and use a relational database system to process queries?

6 Lecture 24 04-15-03 6 Why use Relational DB Systems? Highly reliable, scalable, optimized for performance, advanced functionality Result of 30+ years of Research & Development XML database systems are not “industrial strength” … and not expected to be in the foreseeable future Existing data and applications XML applications have to inter-operate with existing relational data and applications Not enough incentive to move all existing business applications to XML database systems Lessons from object-oriented database systems? Adapted from slides ©Jayavel Shanmugasundaram

7 Lecture 24 04-15-03 7 XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued…

8 Lecture 24 04-15-03 8 Storing and Querying XML Documents Relational Database System XML Translation Layer XML Schema Relational Schema Translation Information XML Documents Tuples XML Query SQL Query Relational Result XML Result Adapted from slides ©Jayavel Shanmugasundaram

9 Lecture 24 04-15-03 9 Relational Data PurchaseOrder IdCustomer 200I YearMonth Cars R Us10June1999 300I Day Bikes R UsnullJuly1999 Payment Installment 40% Percentage Pid 300I 100% 200I 60% 1 2 1 200I Item Name 200I Cost Firestone Tire502000.00 200I Quantity Goodyear Tire2008000.00 Pid 300I Trek Tire20 300ISchwinn Tire1002500.00 400.00 Adapted from slides ©Jayavel Shanmugasundaram

10 Lecture 24 04-15-03 10 SQL Query Find all the items bought by “Cars R Us” in the year 1999 Select it.name From PurchaseOrder po, Item it Where po.customer = “Cars R Us” and po.year = 1999 and po.id = it.pid Predicates Join PurchaseOrder Id Customer 200I YearMonth Cars R Us 10 June 1999 300I Day Bikes R Usnull July 1999 Payment Installment 40% Percentage Pid 300I 100% 200I 60% 1 2 1 200I Item Name 200I Cost Firestone Tire 50 2000.00 200I Quantity Goodyear Tire 200 8000.00 Pid 300I Trek Tire 20 300I Schwinn Tire 1002500.00 400.00 Adapted from slides ©Jayavel Shanmugasundaram

11 Lecture 24 04-15-03 11 XML Document 10 June 1999 50 200 40% 60% Nested structure Self-describing tags Nested sets Order Adapted from slides ©Jayavel Shanmugasundaram

12 Lecture 24 04-15-03 12 XML Schema Date (Item)* (Payment)* PurchaseOrder Date Day? Month Year Day {integer} Month {string} Year {integer} Item Quantity … and so on Adapted from slides ©Jayavel Shanmugasundaram

13 Lecture 24 04-15-03 13 Schemas to Relations: Issues Complex schema specifications Two level nature of relational schema (tuples and attributes) vs. arbitrary nesting of XML Schema Recursion Adapted from slides ©Jayavel Shanmugasundaram

14 Lecture 24 04-15-03 14 Naïve Approach PurchaseOrder Id (200I) Customer (Cars R Us) Date Day (10) Month (June) Year (1999) Item Payment (40%) … Element NodeAttribute Node Adapted from slides ©Jayavel Shanmugasundaram

15 Lecture 24 04-15-03 15 Naïve Approach (Contd.) Problem: Many joins for queries (one per hop) eg. PurchaseOrder/Date/Year Edges Id Name 0 ParentIdType PurchaseOrdernullElement null 1 ValueOrdinal null AttributeId200I00 2 AttributeCustomerCars R Us10 3 ElementDatenull20 4 ElementDay1003 5 ElementMonthJune13 6 ElementYear199923 ……………… Adapted from slides ©Jayavel Shanmugasundaram

16 Lecture 24 04-15-03 16 Desired Properties of Generated Relational Schema R All XML documents conforming to XML schema should be “mappable” to tuples in R All queries over XML documents should be “mappable” to SQL queries over R Not Required: Ability to re-generate XML schema from R Adapted from slides ©Jayavel Shanmugasundaram

17 Lecture 24 04-15-03 17 XML Schema: Further Examples Date? (Item | Payment)* PurchaseOrder (Date | Payment*) (Item (Item Item)* Payment)* PurchaseOrder Date Item (PurchaseOrder)* Payment PurchaseOrder Adapted from slides ©Jayavel Shanmugasundaram

18 Lecture 24 04-15-03 18 Simplifying XML Schemas XML schemas can be “simplified” for translation purposes Without undermining storage and query functionality Date? (Item)* (Payment)* PurchaseOrder (Date | (Payment)*) (Item (Item Item)* Payment)* PurchaseOrder Adapted from slides ©Jayavel Shanmugasundaram

19 Lecture 24 04-15-03 19 Simplification Desiderata Simplify structure, but preserve differences that matter in relational model Single occurrence (attribute) Zero or one occurrences (nullable attribute) Zero or more occurrences (relation) (Date | (Payment)*) (Item (Item Item)* Payment)* PurchaseOrder Date? (Item)* (Payment)* PurchaseOrder Adapted from slides ©Jayavel Shanmugasundaram

20 Lecture 24 04-15-03 20 Simplification Rules Flattening transformations (e1 e2)* -> e1* e2* (e1 e2)? -> e1? e2? (e1 | e2) -> e1? e2? Simplification transformations e** -> e* e*? -> e* e?* -> e* e?? -> e? Grouping transformations e1* e2* e1* -> e1* e2* …etc e+ -> e* What is lost? Adapted from slides ©Jayavel Shanmugasundaram

21 Lecture 24 04-15-03 21 Result: Translation Normal Form An XML schema production is either of the form: … or of the form: {type} P a 1 … a p a p+1 ? … a q ? a q+1 *… a r * P where a i  a j Adapted from slides ©Jayavel Shanmugasundaram

22 Lecture 24 04-15-03 22 Simplified XML Schema Date (Item)* (Payment)* PurchaseOrder Date Day? Month Year Day {integer} Month {string} Year {integer} Item Quantity … and so on Adapted from slides ©Jayavel Shanmugasundaram

23 Lecture 24 04-15-03 23 Relational Schema Generation PurchaseOrder (id, customer) Date DayMonthYear Item (name, cost) Quantity Payment 1 ?11 ** 1 Minimize: Number of joins for simple path expressions (of form /a/b/c) Satisfy: Tables are normalized Adapted from slides ©Jayavel Shanmugasundaram

24 Lecture 24 04-15-03 24 Generated Relational Schema and Shredded XML Document PurchaseOrder IdCustomer 200I YearMonth Cars R Us10June1999 Day Payment Order 40% Value Pid 200I 60% 2 4 200I Item Order Name 200I Cost Firestone Tire 50 2000.00 200I Quantity Goodyear Tire2008000.00 Pid 1 3 Adapted from slides ©Jayavel Shanmugasundaram

25 Lecture 24 04-15-03 25 Example Schema Graph Not just a tree Adapted from slides ©Jayavel Shanmugasundaram

26 Lecture 24 04-15-03 26 Thus far, works well for trees only Intuition: Inline as many sub-elements as possible Do not inline only if it is a shared, recursive or set sub- element. Technique: Necessary and Sufficient Condition for shared/ recursive element: In-degree >= 2 in (simplified) schema graph Shared Inlining Technique Adapted from slides ©Jayavel Shanmugasundaram

27 Lecture 24 04-15-03 27 Relational Schema Generation and XML Document Shredding Any XML Schema X can be mapped to a relational schema R, and … Any XML document XD conforming to X can be converted to tuples in R Further, XD can be recovered from the tuples in R What do you think of the approach, for IrisNet? Exercise: What would the Parking Space Finder relational schema look like? Would there be many or few joins in queries? Adapted from slides ©Jayavel Shanmugasundaram

28 Lecture 24 04-15-03 28 Path Expression with Length 3 Adapted from slides ©Jayavel Shanmugasundaram

29 Lecture 24 04-15-03 29 Varying Path Expression Length Group 1 DTDGroup 3 DTD Adapted from slides ©Jayavel Shanmugasundaram

30 Lecture 24 04-15-03 30 Storing and Querying XML Documents Relational Database System XML Translation Layer XML Schema Relational Schema Translation Information XML Documents Tuples XML Query SQL Query Relational Result XML Result Adapted from slides ©Jayavel Shanmugasundaram

31 Lecture 24 04-15-03 31XPERANTO XML view over tables to reconstruct shredded XML documents Query Processor for XML views of Relational Data XML Document Shredder Relational Schema Generator Relational Schema Information Create XML Document Repository Store XML Documents Query over Stored XML Documents Create tablesStore rows in tables Query over tables Relational Database System Table 1 Table n

32 Lecture 24 04-15-03 32 XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued…

33 Lecture 24 04-15-03 33 LegoDB [Bohannon et al, ICDE’02] An optimization approach: automatically explores a space of possible mappings selects the mapping which has the lowest cost for a given application Important features: Application-driven: takes into account schema, data statistics, and query workload Logical/physical independence: interface is XML-based (XML Schema, XQuery, XML data statistics) Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations Adapted from slides ©Juliana Freire

34 Lecture 24 04-15-03 34 But What If There’s No Schema? Revert to one row per edge STORED [Deutsch, Fernandez, Suciu, Sigmod’99] Looks at data, finds highly supported patterns for tables [Florescu, Kossman, Data Engineering Bulletin, 1999] Id Name 0 ParentIdType PurchaseOrdernullElement null 1 ValueOrdinal null AttributeId200I00 2 AttributeCustomerCars R Us10 3 ElementDatenull20 4 ElementDay1003 5 ElementMonthJune13 6 ElementYear199923 ……………… Adapted from slides ©Jayavel Shanmugasundaram

35 Lecture 24 04-15-03 35 XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued… (Thurs) Updates, Native XML DBMS Also in next lecture: Historical queries


Download ppt "Lecture 24 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April."

Similar presentations


Ads by Google