Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

Similar presentations


Presentation on theme: "Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University."— Presentation transcript:

1 Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University

2 # 2 Problem OLAP-systems are good for complex analysis queries Easy-to-use Fast Business, science... Problems with physical integration in existing OLAP systems Integrating new data requires (partial) cube rebuild => too slow Problems arise with d ynamic data Stock quotes, competitors prices, disease info... Data will often be available in Extended Markup Language (XML) format Weather data, map info, price lists, ……

3 # 3 Solution Allows the use of external XML data as virtual dimensions Decoration (extra info)  Type information. Selection  Condition on XML data Grouping  Categories by XML data Goal: flexible access to XML data from OLAP systems

4 # 4 Overview Contributions Architecture of the federation Linking OLAP and XML The federation query semantics The logical algebra The physical algebra Conversion from logical to physical plans Plan execution Query optimization The query optimizer Execution of an optimized plan Performance Conclusion

5 # 5 Contributions of This Paper Previous OLAP-XML federation efforts A logical algebra A partial, straight-forward implementation Problems with previous work The logical algebra does not accurately reflect query execution tasks Query optimization is based on an abstract level Implementation is very limited Novelties of this paper A physical algebra and simplified query semantics Practical query optimization techniques A full-function, robust query engine Experiments with the query engine

6 # 6 Architecture of the federation OLAP and XML components Auxiliary components Query engine Query analyzer Query optimizer Query evaluator

7 # 7 Linking OLAP and XML Links Relation between a set of dimension values and a set of XML nodes Level expressions / / specifies a concrete link usage Nation/Nlink/Population links nations to populations Nlink TimeOrders EC Year Quarter Month Customer Order Region Nation Supplier Quantity Denmark 5.3 Man. Brand Part Suppliers Nlink={(DK, n1), (CN, n2), (UK, n3)}

8 # 8 The Federation Query Semantics The logical algebra Decoration, Federation selection, Federation Generalized projection, The federation query language : SQL XM SELECT SUM(Quantities), Brand(Part), Nation/Nlink/Population FROM TC WHERE Nation/Nlink/Population<30 GROUP BY Brand(Part), Nation/Nlink/Population

9 # 9 The Physical Algebra Includes data retrieval and manipulation operators A physical plan models real execution tasks  i.e., when, where and how data is processed Nine physical operators  Querying the OLAP component  Cube selection and generalized projection  Data transfer between components  Fact-, dimension- and XML- transfer operators  Temporary data manipulations  Decoration, federation selection and generalized projection  Inlining XML data  Inlining

10 # 10 Querying the OLAP Component Cube selection Has no references to XML data Performs selection over the OLAP cube Intuitively, a SQL SELECT statement Cube generalized projection Has no references to XML data Rolls up dimensions and aggregate specified measures at specified levels Intuitively, a SQL SELECT statement with a GROUP BY clause

11 # 11 Data Transfer Between Components Fact-transfer Transfers the OLAP fact data to the temporary component The temporary facts then can be decorated Intuitively, a SQL SELECT INTO statement Dimension-transfer Transfers dimension data to the temporary component Used when higher level dimension data is required in the temporary component XML-transfer Transfers XML data to the temporary component Uses XPath expressions to identify XML nodes with decoration values

12 # 12 Temporary Data Manipulations Decoration Decorates the cube by adding a new dimension Intuitively, adds a table with dimension and decoration XML data SELECT * FROM t (supplier, nation) t 1, t (nation, population) t 2 WHERE t 1.nation = t 2.nation Federation selection Performs selection over the cube in the temporary component Intuitively, a SQL selection over the temporary tables SELECT t 1.* FROM t fact t 1, t (supplier, population) t 2 WHERE t 1.supplier = t 2.supplier and population<30 Federation generalized projection Rolls up and aggregates the cube in the temporary component Intuitively, a SQL selection with a GROUP BY clause SELECT SUM(Quantity), t 2.population FROM t fact t 1, t (supplier, population) t 2 WHERE t 1.supplier= t 2.supplier GROUP BY t 2.population

13 # 13 Inlining XML Data Denoted as Comparing federated data in the temporary component is expensive Inlining refers to integrating XML data into the OLAP selections A resulting predicate Only references dimension levels and constants Can be evaluated in the OLAP component NationPopulation DK5.3 CN1264.5 UK19.1 Nation/Nlink/Population<30 Nation=‘DK’ OR Nation=‘UK’ +

14 # 14 From Logical to Physical Plans

15 # 15 Plan Execution QuantityExtPriceSupplierPartOrderDay 1717954S1P3112/12/96 2829983S2P44230/3/94 22388S3P348/12/96 2626374S4P22010/11/93 NationPopulation DK5.3 CN1264.5 UK19.1 SupplierNation S1DK S2DK S3CN S4UK 5.3DK 1264.5CN 19.1UK PopulationNation UK CN DK Nation S3 S1 S2 S4 Supplier 19.1 1264.5 5.3 Population S3 S1 S2 S4 Supplier 19.1 1264.5 5.3 Population S3 S1 S2 S4 Supplier S410/11/9320P2 S230/3/9442P4 8/12/964P3S3 2/12/96P311 26374 29983 2388 26 28 2 17954 DayPartOrderExtPric e S117 SupplierQuantity 19.110/11/9320P2 5.330/3/9442P4 2/12/96P311 26374 29983 26 28 17954 DayPartOrderExtPrice 5.317 PopulationQuantity PartBrand P2B2 P3B3 P4B4 QuantityPopulationBrand 175.3B3 285.3B4 2619.1B2

16 # 16 The Query Optimizer Based on the Volcano optimizer Four phases optimization at one stage Logical equivalent plan enumeration One-to-one logical to physical conversion Estimating cost of physical plans: Cost-based plan space pruning

17 # 17 An Optimized Query Plan

18 # 18 Execution of the Optimized Plan NationPopulation DK5.3 CN1264.5 UK19.1 S410/11/9320P2 S230/3/9442P4 2/12/96P311 26374 29983 26 28 17954 DayPartOrderExtPrice S117 SupplierQuantity NationBrand 17DKB3 28DKB4 26UKB2 UKB2 DKB4 B3 26 28 Brand DK17 NationQuantity UKB2 DKB4 B3 26 28 Brand DK17 NationQuantity 5.3DK 1264.5CN 19.1UK PopulationNation QuantityPopulationBrand 175.3B3 285.3B4 2619.1B2

19 # 19 Performance One experiment compared: a. Our federated solution b. Physical integration c. Federating cached XML data Data 100M fact data based on TPC-H benchmark 11MB and 2KB XML data Queries Result: Comparable to b for small amounts of data Use c for large amounts of data

20 # 20 Related Work Generic data integration Relational, XML, semi-structured, OO,… + combinations Do not consider OLAP DB properties such as automatic aggregation, dimension hierarchies and correct aggregation OLAP-object federations Current solution offers much more general use of external data Current solution not restricted to rigid object schemas Current solution allows irregular data Previous OLAP-XML federation efforts A logical algebra A partial, straight-forward implementation

21 # 21 Conclusion OLAP handles schema changes and dynamic data poorly Solutions Logical federation of OLAP and XML A physical algebra models actual execution tasks Optimized query evaluation Experiments suggest feasibility Future work More optimization techniques Advanced evaluation techniques Co-operative development with OLAP query tool vendor


Download ppt "Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University."

Similar presentations


Ads by Google