Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.

Similar presentations


Presentation on theme: "Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work."— Presentation transcript:

1 Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work with:Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh Berthold Reinwald Eugene Shekita

2 Outline Why? How? Which? Hence

3 XML Example John Mary Internet Recycling

4 What is the big deal about XML? Elegantly models complex, hierarchical/ graph-structured data Domain-specific tags (unlike HTML) Simple!  Fast emerging as dominant standard for data exchange on the WWW

5 Why Relational Data? Most business data stored in relational databases Unlikely to change in the near future –Scalability, Reliability, Performance, Tools  Need efficient means to publish relational data as XML documents

6 Usage Scenario Existing Database System (RDBMS) Application/User Query to produce XML Documents XML Result (processed or displayed in browser) The Internet

7 Example Relational Schema Department DeptIdDeptName 10 Purchasing Project ProjId DeptIdProjName 888 10Internet 79510Recycling Employee EmpId DeptIdEmpName 101 10John 9110Mary Salary 50K 70K

8 XML Representation John Mary Internet Recycling

9 Main Issues Relational data is flat, XML is a tagged graph How do we specify translation from flat model to a graph model? –A query language to map from relations to XML How do we transform flat representations to tagged nested representations? –Efficient implementation strategies

10 Outline Why? How? –Language? –Mechanism? Which? Hence

11 Transformation Languages Two obvious choices: –XML Query Language –SQL

12 Example Relational Schema Department DeptIdDeptName 10 Purchasing Project ProjId DeptIdProjName 888 10Internet 79510Recycling Employee EmpId DeptIdEmpName 101 10John 9110Mary Salary 50K 70K

13 XMLQL: Default XML View 10 Purchasing 101 10 John 50K 91 10 Mary 70K 888 10 Internet 795 10 Recycling

14 XMLQL: Query Over Default View WHERE $did $dname IN DefaultView CONSTRUCT { WHERE $did $ename IN DefaultView CONSTRUCT $ename } { WHERE $did $pname IN DefaultView CONSTRUCT $pname }

15 XMLQL: Query Result John Mary Internet Recycling

16 XMLQL: Pros and Cons Pros: –Natural for XML users –Infrastructure to build hierarchies of XML views –One query language for XML and relational data Cons: –Ignores existing API (JDBC), tools, support –Need to mature new query language (aggregates etc.)

17 SQL: Key Ideas Sub-queries to specify nesting Scalar functions to specify tags/attributes –XML Constructors Aggregate functions to group child elements

18 SQL: Query to publish XML Select DEPT(d.name,, ) From Department d

19 SQL: XML Constructor Define XML Constructor DEPT(dname: varchar(20), emplist: xml, projlist: xml) As ( $emplist $projlist )

20 SQL: Query to publish XML Select DEPT(d.name,, ) From Department d

21 SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), ) From Department d

22 SQL: XML Constructor Define XML Constructor EMP(ename: varchar(20)) As ( $ename )

23 SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), ) From Department d

24 SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) ) From Department d

25 Query Result John Mary Internet Recycling ( )

26 SQL: Pros and Cons Pros: –Reuses SQL infrastructure/API –Natural for SQL users –Efficient execution inside relational engine Cons: –Limited support for XML View Composition

27 Outline Why? How? –Language? –Mechanism? Which? Hence

28 Relations to XML: Issues Two main differences: –Nesting (structuring) –Tagging Space of alternatives: Late TaggingEarly Tagging Late Structuring Early Structuring Inside Engine Outside Engine

29 Stored Procedure Approach Issue queries for sub-structures and tag them Could be a Stored Procedure DBMS Engine Department Employee Project Problem: Too many SQL queries! (10, Purchasing) (John) (Mary) (Internet) (Recycling) Early Tagging, Early Structuring, Outside Engine

30 Correlated CLOB Approach Problem: Correlated execution of sub-queries Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) ) From Department d Early Tagging, Early Structuring, Inside Engine

31 De-Correlated CLOB Approach Problem: CLOBs during processing With EmpStruct (deptname, empinfo) AS ( Select d.deptname, XMLAGG(EMP(employee, e.empname)) From department d left join employee e on d.deptid = e.deptid Group By d.deptname) With ProjStruct (deptname, projinfo) AS ( Select d.deptname, XMLAGG(PROJ(employee, p.projname)) From department d left join project p on d.deptid = e.deptid Group By d.deptname) Select DEPT(name, d1.empinfo, d2.projinfo)) From EmpStruct d1 full join ProjStruct d2 on d1.deptname = d2.deptname Early Tagging, Early Structuring, Inside Engine

32 Late Tagging, Late Structuring XML document content produced without structure (in arbitrary order) Tagger enforces order as final step Relational Query Processing Unstructured content Tagging Result XML Document

33 Redundant Relation Approach How do we represent nested content as relations? (10, Purchasing) (10, Internet) (10, Recycling) (10, John) (10, Mary) (Purchasing, John, Internet) (Purchasing, John, Recycling) (Purchasing, Mary, Internet) (Purchasing, Mary, Recycling) Problem: Large relation due to data redundancy! Late Tagging, Late Structuring

34 Outer Union Approach How do we represent nested content as relations? Problem: Wide tuples (having many columns) Department EmployeeProject Department EmployeeProject Union (Purchasing, Internet) (Purchasing, Recycling) (Purchasing, John) (Purchasing, Mary) (10, Purchasing) (Purchasing, null, Internet, 0) (Purchasing, null, Recycling, 0) (Purchasing, John, null, 1) (Purchasing, Mary, null, 1) Late Tagging, Late Structuring

35 Hash-based Tagger Results not structured early –In arbitrary order Tagger has to enforce order during tagging –Hash-based approach Inside/Outside engine tagger Late Tagging, Late Structuring Problem: Requires memory for entire document

36 Late Tagging, Early Structuring Structured XML document content produced Tagger just adds tags (constant space) Relational Query Processing Structured content Tagging Result XML Document

37 Sorted Outer Union Approach A B C DEFG A B n n E n n A n C n n F n A n C n n n G Late Tagging, Early Structuring A B n D n n n Sort By: Aid, Bid, Cid Problem: Only partial ordering required

38 Constant Space Tagger Detects changes in XML document hierarchy Adds appropriate opening/closing tags Inside/outside engine Late Tagging, Late Structuring

39 Classification of Alternatives Late TaggingEarly Tagging Late Structuring Early Structuring Inside Engine De-Correlated CLOB Outside Engine Stored Procedure Inside Engine Outside Engine Sorted Outer Union (Tagging inside) Sorted Outer Union (Tagging outside) Unsorted Outer Union (Tagging inside) Unsorted Outer Union (Tagging outside) Outside Engine Correlated CLOB

40 Outline Why? How? –Language? –Mechanism? Which? Hence

41 Performance Evaluation Query Depth Query Fan Out Database Size

42 Inside vs. Outside Engine

43 Where Does Time Go?

44 Effect of Query Fan Out

45 Effect of Query Depth

46 Memory Considerations Sorted outer union more robust Relational sort highly scalable!

47 Outline Why? How? –Language? –Mechanism? Which? Hence

48 Conclusion Publishing XML from relational sources important in Internet Language alternatives: –SQL based –XML query language based Implementation Alternatives –Inside engine >> Outside engine –Unsorted Outer Union : sufficient main memory –Sorted Outer Union : otherwise

49


Download ppt "Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work."

Similar presentations


Ads by Google