Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.

Similar presentations


Presentation on theme: "Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki."— Presentation transcript:

1 Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp

2 Presentation Overview Background Background XML Views XML Views Support for foreign functions Support for foreign functions Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

3 XML Views XML: content-description language on the Web XML: content-description language on the Web XML Views over RDBs XML Views over RDBs Constructing virtual XML views over RDBs Constructing virtual XML views over RDBs Data items are stored in RDBs Data items are stored in RDBs Selecting and transforming data items into appropriate XML formats Selecting and transforming data items into appropriate XML formats XML views are constructed using middleware technologies XML views are constructed using middleware technologies Effective use of the data management and query processing facilities of the underlying RDBMSs Effective use of the data management and query processing facilities of the underlying RDBMSs XPERANTO (IBM) [3,7,8], SilkRoute (AT&T) [4,5] XPERANTO (IBM) [3,7,8], SilkRoute (AT&T) [4,5]

4 Virtual XML Views and Middleware Middleware provides virtual XML view facility Middleware provides virtual XML view facility A user query is specified by an XML query language (e.g., XQuery) toward an XML view A user query is specified by an XML query language (e.g., XQuery) toward an XML view Middleware creates a query plan and issues an SQL query to RDBMS Middleware creates a query plan and issues an SQL query to RDBMS Middleware transforms an SQL query result into the final XML format Middleware transforms an SQL query result into the final XML format adds XML tags adds XML tags may perform remaining query tasks may perform remaining query tasks Middleware RDBMS Virtual XML View User Query Result SQL Query SQL Query Result

5 Example of Relational Database city cidcnamepopulation C0100A16 C0101B13 C0103D4......... locationcidxy C0100100400 C0100100200......... C0101110250 C0101150200......... facilityfidfnamecid I0015 E Mall C0100 I0016 F Park C0100 I0017 G Studium C0100 I0018 H Library C0101 I0019 I Mall C0102 I0020 J Park C0103......... Fig. 2

6 Example of XML View <cities> A A 16 16 100 400 100 400 100 200 100 200...... E Mall E Mall F Park F Park............</cities> Fig. 1

7 Presentation Overview Background Background XML Views XML Views Support for foreign functions Support for foreign functions Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

8 User Query with Foreign Function User query may contain foreign functions User query may contain foreign functions Example in XQuery Example in XQuery { { for $city in view("cities")/cities/city for $city in view("cities")/cities/city where isWider($city/location, 10000, "km") where isWider($city/location, 10000, "km") and $city/population >= 10 and $city/population >= 10 return return $city/cname $city/cname $city/facility $city/facility } } Fig. 3

9 User-defined Foreign Functions Usually coded with a general-purpose programming language (e.g., Java) Usually coded with a general-purpose programming language (e.g., Java) Receive in-memory representation of target XML document fragments (e.g., DOM) Receive in-memory representation of target XML document fragments (e.g., DOM) Middleware should evaluate foreign functions: since conventional RDBMSs do not support such facilities Middleware should evaluate foreign functions: since conventional RDBMSs do not support such facilities Middleware <location> 100 100 400 400 100 100 200 200......</location> ForeignFunction XML Fragment

10 Presentation Overview Background Background Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

11 Our Approach Processing XML view queries including foreign functions Processing XML view queries including foreign functions By the cooperation of a conventional RDBMS and a middleware system By the cooperation of a conventional RDBMS and a middleware system Extension of the XPERANTO framework Extension of the XPERANTO framework Proposal of two query processing methods Proposal of two query processing methods Two-step processing method Two-step processing method One-step processing method One-step processing method

12 Middleware Two-Step Processing Method Query Planning Query Execution Foreign Function Evaluation Result XML Result XML Generation Generation RDBMS SQL Query SQL Query Result SQL Query User Query Result 1st query - to evaluate - to evaluate foreign functions foreign functions 2nd query - to generate the result - to generate the result SQL Query Result

13 Middleware One-Step Processing Method Query Planning Query Execution Foreign Function Evaluation Result XML Result XML Generation Generation RDBMS SQL Query SQL Query Result User Query Result combined query - to evaluate - to evaluate foreign functions foreign functions - to generate the result - to generate the result

14 Presentation Overview Background Background Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

15 XML Views in XPERANTO A default XML view is automatically created from the underlying relational tables A default XML view is automatically created from the underlying relational tables An XML view is defined over the default XML view using XQuery An XML view is defined over the default XML view using XQuery Relational Tables Default XML View <db>...</db> View Definition XQuery View Definition XQuery virtual XML View automatic derivation virtual XML view

16 Default XML View A default XML view is automatically created from the underlying relational tables A default XML view is automatically created from the underlying relational tables Each element corresponds to a relational tuple Each element corresponds to a relational tuple <db> C0100 C0100 A A 16 16...... C0100 C0100 100 100 400 400............ Fig. 6 corresponds to a city tuple

17 XML View Definition create view cities as { for $city in view("default")/city/row for $city in view("default")/city/row return return $city/cname $city/cname $city/population $city/population for $location in view("default")location/row for $location in view("default")location/row where $city/cid = $location/$cid where $city/cid = $location/$cid return return $location/x $location/y $location/x $location/y...... } Fig. 7 An XML view is defined over the default XML view using XQuery An XML view is defined over the default XML view using XQuery This view definition creates the XML view shown in Fig. 1 This view definition creates the XML view shown in Fig. 1

18 Query Processing in XPERANTO View Definition XQuery virtual XML View Default XML View <db>...</db> User Query User Query XQGM Graph View Definition XQGM Graph ViewComposition Composed XQGM Graph XQGM GraphTransformation - Computation Pushdown - Tagger Pull-Up SQLQuery XML Tag Operators Generation

19 View Definition XQGM Graph Each node corresponds to an extended relational operator (shown in Table 1) Each node corresponds to an extended relational operator (shown in Table 1) Contains a correlated join operator Contains a correlated join operator Node 11 is an abbreviated representation of node 12 Node 11 is an abbreviated representation of node 12 Node 12 contains tag operators shown in Table 2 Node 12 contains tag operators shown in Table 2 Fig. 8

20 User Query XQGM Graph Fig. 9

21 Presentation Overview Background Background Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

22 Some Problems and Our Extension The original XPERANTO approach [7] has some problems The original XPERANTO approach [7] has some problems It shows only simple transformation examples It shows only simple transformation examples It only considers It only considers Computation pushdown processing for top-level relations Computation pushdown processing for top-level relations Simple XQuery query: selection query with one where clause Simple XQuery query: selection query with one where clause Simple where clause with one condition Simple where clause with one condition Simple return clause Simple return clause It does not consider foreign functions It does not consider foreign functions Our extension to the XPERANTO approach Our extension to the XPERANTO approach Devised user query graph generation (Fig. 9) Devised user query graph generation (Fig. 9) Incorporation of the intersection operator for multiple conditions in a where clause Incorporation of the intersection operator for multiple conditions in a where clause Query translation that considers our extension Query translation that considers our extension

23 Query Translation Our extension to the XPERANTO approach Our extension to the XPERANTO approach Inclusion of foreign functions in where clause Inclusion of foreign functions in where clause Handling output specifications in return clause Handling output specifications in return clause Treatment of multiple conditions in where clause Treatment of multiple conditions in where clause Treatment of computation pushdown to subrelations Treatment of computation pushdown to subrelations Query translation consists of the following steps Query translation consists of the following steps Decorrelation Decorrelation View composition View composition Computation pushdown Computation pushdown Tagger pull-up Tagger pull-up

24 Decorrelation of View Definition XQGM Graph A correlated join operator has high execution cost A correlated join operator has high execution cost Decorrelation step eliminates correlated join operators Decorrelation step eliminates correlated join operators View definition XQGM graph (Fig. 8) is translated as Fig. 10 View definition XQGM graph (Fig. 8) is translated as Fig. 10 Fig. 10

25 Decorrelation of User Query XQGM Graph User query XQGM graph (Fig. 9) is also translated as Fig. 11 User query XQGM graph (Fig. 9) is also translated as Fig. 11 Fig. 11

26 View Composition Compose a view definition XQGM graph and a user query XQGM graph, then apply function composition rules in Table 3 Compose a view definition XQGM graph and a user query XQGM graph, then apply function composition rules in Table 3 This step is almost same as the original XPERANTO approach This step is almost same as the original XPERANTO approach Composition of Fig. 10 and 11 yields Fig. 12 Composition of Fig. 10 and 11 yields Fig. 12 Fig. 12

27 Computation Pushdown Pushdown XQGM operators towards the leaves of the graph as much as possible Pushdown XQGM operators towards the leaves of the graph as much as possible For the efficient evaluation using the query processing power of RDBMS For the efficient evaluation using the query processing power of RDBMS However, we cannot pushdown foreign function evaluation However, we cannot pushdown foreign function evaluation Foreign function evaluation is performed in the middleware Foreign function evaluation is performed in the middleware Evaluation in the middleware requires XML fragments Evaluation in the middleware requires XML fragments Therefore, pushdown computation except for foreign function evaluations Therefore, pushdown computation except for foreign function evaluations

28 Tagger Pull-Up Replace XML functions with tag operators Replace XML functions with tag operators Pull-up tag operators upward as much as possible Pull-up tag operators upward as much as possible Two SQL queries are generated Two SQL queries are generated SQL-1 from where clause SQL-1 from where clause SQL-2 from return clause SQL-2 from return clause Fig. 13

29 Presentation Overview Background Background Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

30 Two-Step Processing Method (1) Middleware Query Planning Query Execution Foreign Function Evaluator Tagger Tagger RDBMS Result XML SQL Query SQL Query Control Control Tagger-1, 2 SQL-1, 2 Tuple-1, 2 Fragment Keys User Query SQL-1, Tagger-1 is used to retrieve tuples to evaluate foreign functions (Tuple-1) The qualified key value set Keys is combined with SQL-2 to select the result tuples

31 Two-Step Processing Method (2) Two approaches for the generation of SQL-2 Two approaches for the generation of SQL-2 Two-Step Processing Method (where) Two-Step Processing Method (where) The qualified key value set Keys (obtained by foreign function evaluation) is embedded into the where clause of SQL-2 (e.g., "where fid in Keys") The qualified key value set Keys (obtained by foreign function evaluation) is embedded into the where clause of SQL-2 (e.g., "where fid in Keys") Two-Step Processing Method (tmp) Two-Step Processing Method (tmp) First, one-column temporary table is created from the key values in Keys First, one-column temporary table is created from the key values in Keys Then a join operation with the temporary table is incorporated in SQL-2 Then a join operation with the temporary table is incorporated in SQL-2

32 One-Step Processing Method Middleware Query Planning Query Execution Foreign Function Evaluator Tagger Tagger RDBMS Result XML SQL Query SQL Query Control Control Tagger-1, 2 SQL-1, 2 Tuple Fragment Keys UserQuery SQL-1 and SQL-2 are integrated in one SQL query (SQL-3) The middleware selects tuples of the final result using the qualified key set Keys

33 Presentation Overview Background Background Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

34 Outline of Experiments (1) PostgreSQL on Linux PC PostgreSQL on Linux PC Relational Tables Relational Tables No. of city tuples: N = 1000 No. of city tuples: N = 1000 No. of location tuples: 10N and 100N No. of location tuples: 10N and 100N No. of facility tuples: 10N and 100N No. of facility tuples: 10N and 100N Four Types of Queries Four Types of Queries Q1: For each city whose area is larger than X, show its name and facilities Q1: For each city whose area is larger than X, show its name and facilities Q2: For each city whose area is larger than X, show its name, location information, and facilities Q2: For each city whose area is larger than X, show its name, location information, and facilities Q3: For each city whose area is larger than X and whose population is larger than Y, show its name and facilities (Q1 + additional selection condition) Q3: For each city whose area is larger than X and whose population is larger than Y, show its name and facilities (Q1 + additional selection condition) Q4: Q3 + additional selection condition Q4: Q3 + additional selection condition

35 Outline of Experiments (2) Selectivity factors Selectivity factors Area condition: S a = 0.1, 0.3, 0.5, 0.7, and 0.9 Area condition: S a = 0.1, 0.3, 0.5, 0.7, and 0.9 Population condition: S p = 0.1 and 0.3 Population condition: S p = 0.1 and 0.3 Processing costs Processing costs Foreign function evaluation and XML generation are relatively small and almost equally included in both methods Foreign function evaluation and XML generation are relatively small and almost equally included in both methods Cost of two-step processing method: processing cost of SQL-1 and SQL-2 Cost of two-step processing method: processing cost of SQL-1 and SQL-2 Cost of one-step processing method: processing cost of SQL-3 (SQL-1 + SQL-2) Cost of one-step processing method: processing cost of SQL-3 (SQL-1 + SQL-2)

36 Q3 with S p = 0.3 (no. of facility tuples = 10N) Q3: For each city whose area is larger than X and whose population is larger than Y, show its name and facilities Q3: For each city whose area is larger than X and whose population is larger than Y, show its name and facilities No. of location tuples = 10N or 100N No. of location tuples = 10N or 100N Three methods have similar costs Three methods have similar costs Fig. 20

37 Q3 with S p = 0.3 (no. of facility tuples = 100N) Two-step methods are better if selectivity of foreign function is low Two-step methods are better if selectivity of foreign function is low filtering is well- performed filtering is well- performed Two-step method (where) is worse than (tmp) Two-step method (where) is worse than (tmp) embedding of key values is not efficient embedding of key values is not efficient Fig. 21

38 Q3 with S p = 0.1 (no. of facility tuples = 10N) Selectivity of the population attribute is small ⇒ pushdown to RDBMS is can reduce the no. of tuples Selectivity of the population attribute is small ⇒ pushdown to RDBMS is can reduce the no. of tuples One-step method is better One-step method is better two-step methods have overheads two-step methods have overheads Fig. 22

39 Q4 with S p = 0.3 (no. of facility tuples = 10N) Q4: For each city whose area is larger than X and whose population is larger than Y, show its name, location information, and facilities Q4: For each city whose area is larger than X and whose population is larger than Y, show its name, location information, and facilities If no. of location tuples is large (100N) and if the selectivity of foreign function is small, two-step method (tmp) is better If no. of location tuples is large (100N) and if the selectivity of foreign function is small, two-step method (tmp) is better Fig. 23

40 Summary of Experiments Two-step processing method (where) is worse than two-step processing method (tmp) in most situations Two-step processing method (where) is worse than two-step processing method (tmp) in most situations key value embedding is not a good idea key value embedding is not a good idea The cost of one-step processing does not depend on the selectivity of foreign function The cost of one-step processing does not depend on the selectivity of foreign function If a query only contains a foreign function condition (Q1 and Q2), two-step processing method (tmp) is generally efficient when the selectivity of foreign function is small If a query only contains a foreign function condition (Q1 and Q2), two-step processing method (tmp) is generally efficient when the selectivity of foreign function is small If a query contains additional conditions (Q3 and Q4), the efficiency is depend on the selectivity factors If a query contains additional conditions (Q3 and Q4), the efficiency is depend on the selectivity factors If the processing cost of RDBMS is small, one-step processing method is efficient If the processing cost of RDBMS is small, one-step processing method is efficient

41 Guideline of Usage Do not use two-step method (where) Do not use two-step method (where) If the processing cost in the RDBMS is quite small, use one-step method If the processing cost in the RDBMS is quite small, use one-step method If the query only contains foreign functions, use two-step method (tmp) If the query only contains foreign functions, use two-step method (tmp) If the query contains additional filtering conditions If the query contains additional filtering conditions we have to select an appropriate one from one- step method and two-step method (tmp) we have to select an appropriate one from one- step method and two-step method (tmp) the selection is depend on the selectivity factors the selection is depend on the selectivity factors

42 Presentation Overview Background Background Overview of Our Approach Overview of Our Approach The XPERANTO Approach The XPERANTO Approach Extension to the XPERANTO Approach Extension to the XPERANTO Approach Query Processing Architecture Query Processing Architecture Experimental Evaluation Experimental Evaluation Conclusions and Future Work Conclusions and Future Work

43 Conclusions Processing methods for XML view over relational databases especially when queries include foreign function calls Processing methods for XML view over relational databases especially when queries include foreign function calls Cooperation approach of middleware and RDBMS Cooperation approach of middleware and RDBMS Extension of XPERANTO framework Extension of XPERANTO framework Proposal of two query processing methods Proposal of two query processing methods Two-step processing method (where/tmp) Two-step processing method (where/tmp) One-step processing method One-step processing method Performance evaluation based on experiments Performance evaluation based on experiments

44 Future Work Broadening the supportable XML queries Broadening the supportable XML queries Query optimization Query optimization reduction redundancy reduction redundancy Development of other query processing approaches Development of other query processing approaches bitmap indexes bitmap indexes with clause in SQL:1999 with clause in SQL:1999 Selection of an appropriate query processing method Selection of an appropriate query processing method development of selection heuristics development of selection heuristics

45 Thank you

46 Q1 (no. of facility tuples = 100N) Q1: No filtering condition on population attribute Q1: No filtering condition on population attribute Fig. 26


Download ppt "Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki."

Similar presentations


Ads by Google