Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa

Similar presentations


Presentation on theme: "A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa"— Presentation transcript:

1 A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu http://www.cs.uiowa.edu/~rlawrenc/

2 Page 2 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Overview The goal of this research is to convert existing relational schemas to XML schemas with minimal user input. The contributions of this work are: u A fully automated relational to XML schema conversion algorithm that minimizes a user-specified cost function (in this case space efficiency) that does not require the user annotate the relational schema or map to an intermediate model. u The algorithm incorporates user preferences during the mapping if they are present. Thus, the mapping can be completed with no user involvement, or as much involvement as the user desires. u Empirical results demonstrating how XML nesting improves on flat translation by creating more space efficient schemas without introducing redundancy.

3 Page 3 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Motivation Many applications are being migrated from using relational databases to using XML. This conversion can be very costly if the application designers must also re-engineer the schema. The ability to automatically convert a relational schema to XML schema is valuable. Further, it is useful if the translation is not a “flat translation” but rather uses nesting to improve the efficiency and readability of the resulting XML schema. Previous conversion algorithms required the user to specify a complete mapping or did not allow user-specified constraints.

4 Page 4 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Previous Work The previous work can be put in 4 categories: u 1) Flat translation - converts to XML without using nesting. u 2) Query-based translation - conversion occurs by using a query extraction language that is an extension of SQL or a new XML query language. e.g. SilkRoute [Fernandez02] and XML publishing [Carey00]. u 3) Model-based translation - converts the relational model to an intermediate model and then maps to XML using translation rules. e.g. [Bird00,Du01,Embley01] u 4) Dependency-based translation - uses dependency information in schema to translate to XML. e.g. NeT and CoT [Lee02]. No system is fully automated and yet at the same time respects user constraints on the final schema result. No previous work uses a cost function to return the optimal nesting.

5 Page 5 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Nesting and Efficiency Exploiting nesting in XML has three major advantages: u 1) Improves readability by grouping related concepts. u 2) Improves space efficiency by avoiding repetition and encoding of foreign keys that can be implicitly defined by the hierarchical structure. u 3) Improves query efficiency by clustering concepts together and avoiding joins.

6 Page 6 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Space Efficiency Metric Space efficient XML documents can be achieved by nesting as it eliminates foreign keys. For example, in TPC-H, order information can be nested under customer information, and the foreign key orders.o_custkey is not represented explicitly: Amount of space savings: u Number of data values not stored is equal to the size of the relation (in this case orders) *(# of foreign key attributes). u Let |R| denote the number of tuples of R. The space savings of nesting relational schema R under S on key K is denoted by savings(R,S,K) and is approximated by sizeOf(K)*|R|, where sizeOf(K) is the maximum schema size of the foreign key K of R.

7 Page 7 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Nest Graph A nest graph represents the nesting possibilities and their desirability by using existing relational schema information. u A Nest-Graph G = (V,E) is a directed, weighted graph consisting of a set of nodes V and a directed edge set E, such that for each relation R in the database schema, there exists a node V R  V, and for each non-nullable foreign key FK R (S) there is an edge e=(V S,V R,w)  E, where w=savings(R,S,FK R (S)). Nest-graph for TPC-H:

8 Page 8 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML XML Mapping Algorithm The mapping from the relational model to an XML schema has these general steps: u Extract relational schema information including foreign keys and relation sizes from the database. u Use the information to build a Nest-Graph G. u Use the classical algorithm by Edmonds to calculate the maximum weight arborescence T of G. u The maximum weight arborescence is a maximal spanning tree (MST) T. Use T to generate the nesting in the XML schema. ðAssume that given an edge e=(S,R,w) of T, the two relations are S(A 1,A 2,...,A m ) and R(B 1,B 2,...,B n ) respectively. Let A 1 =PK S, B1=PK R, and B k =FK R (S). ðThe primary keys and foreign keys are encoded as XML attributes, and all other relational attributes are encoded as XML elements. ðNote: Introduce a new node r' to G, and a set of edges E' where each e=(r',V i,0)  E' for all i=1..|V|. This guarantees that a MST will exist.

9 Page 9 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML XML Mapping Algorithm (2) Using the maximal spanning tree T: u For each edge e=(S,R,w) of T, we nest R under S (and omit B k ): u For each edge e=(S,R,w) where e  G and e  T, this relationship will be captured using ID/IDREF. Under the element R is a IDREF attribute like:

10 Page 10 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Maximal Spanning Tree Result Maximal spanning tree: Space savings achieved: u For TPC-H at scale factor 1, the space savings is 57,849,820 characters or 14,462,455 data values. Equivalently, the savings is 43.5% of all foreign keys or 12.4% of all data values.

11 Page 11 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML XML Result DTD

12 Page 12 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML User-Directed Mapping The algorithm determines the optimal space efficient nesting of relational schemas into an XML schema given no constraints. u However, the user can control the XML schema output by specifying constraints on the XML schema that should be satisfied without having to specify a total mapping. Four types of constraints can be incorporated into the mapping algorithm by appropriate modifications to the Nest-Graph: u 1) Non-nested nodes - by removing node’s incoming edges. u 2) Edges (nestings) that must be present - edge weight =  u 3) Edges (nestings) that must not be present - remove edge u 4)Handling nullable foreign keys - user decides on encoding; may use null-record. User may also specify any cost function, just not the space efficiency one discussed here.

13 Page 13 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Experimental Results Results summary: u 40-50% of foreign keys do not have to be encoded. Elimination of all foreign keys is only possible if the Nest-Graph is a tree. u 10 to 30% of all attributes are not encoded.

14 Page 14 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Future Work and Conclusions Developed an efficient and user-friendly tool for automating construction of XML schemas from relational schemas. System is an improvement over current approaches which either do not have formal methods for capturing user constraints or require the user to exactly specify the entire XML schema. u The space efficiency cost-metric proposed is useful, although other metrics are possible including those that factor in query costs as well. Future work involves expanding the algorithm to consider cost functions and nestings that introduce redundancy to increase query performance at the sacrifice of space efficiency. Also, a investigation on nesting query results will be performed.

15 A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu http://www.cs.uiowa.edu/~rlawrenc/ Thank You!

16 Page 16 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Extra Slides...

17 Page 17 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML regionkey, name, commentRegion nationkey, name, regionkey, commentNation orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, returnflag, tax, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment LineItem orderkey, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment Order custkey, name, address, nationkey, phone, acctbal, mktsegment, commentCustomer partkey, suppkey, availqty, supplycost, commentPartSupp supkey, name, address, nationkey, phone, acctbal, commentSupplier partkey, name, mfgr, brand, type, size, container, retailprice, commentPart AttributesTable TPC-H Schema

18 Page 18 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML n_regionkey  r_regionkey RegionNation o_custkey  c_custkey CustomerOrder c_nationkey  n_nationkey NationCustomer s_nationkey  n_nationkey NationSupplier ps_suppkey  s_suppkey SupplierPartSupp ps_partkey  p_partkey PartPartSupp l_orderkey  o_orderkey OrderLineItem l_partkey, l_suppkey  ps_partkey, ps_suppkey PartSuppLineItem l_suppkey  s_suppkey SupplierLineItem l_partkey  p_partkey PartLineItem Functional DependenciesTable Referenced Table with Foreign Key Primary KeysForeign Keys Foreign Keys in TPC-H

19 Page 19 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Query Result Nesting A related problem is the problem of converting the result of a query over a relational schema into XML. Space savings can be achieved by taking the de-normalized query result and applying some normalization by exploiting nesting. A query example: SELECT n_name, c_name, o_orderdate, o_orderkey, l_partkey, l_quantity, s_name FROM customer C, orders O,lineitem L,supplier S, nation N WHERE C.c_custkey = O.o_custkey AND L.l_orderkey = O.o_orderkey AND L.l_suppkey = S.s_suppkey AND C.c_nationkey = N.n_nationkey u This query returns over 6 million records. Using flat translation wastes space. For instance, since the cardinality of nation is 25, the XML encoding would redundantly encode approximately 6 million nation names. u The algorithm can be used with queries involving selection, projection, and joins between relations on primary/foreign keys.

20 Page 20 The University of Iowa. Copyright© 2005 Ramon Lawrence - A Cost-based Approach For Converting Relational Schemas to XML Query Nesting Results Results summary: u For the example query, nesting the query result reduces the number of data values in the XML document by 50%, which corresponds to 20,904,839 fewer values. u For other benchmark TPC-H queries, 20-30% of data values were not encoded. Even with compression, this is significant as it results in smaller files to compress.


Download ppt "A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa"

Similar presentations


Ads by Google