Query Optimization For OLAP-XML Federations Dennis Pedersen Karsten Riiis Torben Bach Pedersen Nykredit Center for Database Research Department of Computer.

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
© Copyright 2011 John Wiley & Sons, Inc.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Overview Distributed vs. decentralized Why distributed databases
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Automatic Data Ramon Lawrence University of Manitoba
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Chapter 13 The Data Warehouse
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Query Processing Presented by Aung S. Win.
XCube XML For Data Warehouses By Sven Groot. Data warehouses Contains data drawn from several databases and external sources Contains data drawn from.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Access Path Selection in a Relational Database Management System Selinger et al.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Database Management 9. course. Execution of queries.
OnLine Analytical Processing (OLAP)
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
11th SSDBM, Cleveland, Ohio, July 28-30, 1999 Supporting Imprecision in Multidimensional Databases Using Granularities T. B. Pedersen 1,2, C. S. Jensen.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
CS 540 Database Management Systems
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Chapter 13 The Data Warehouse
COSC 6340 Projects & Homeworks Spring 2002
Database management concepts
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

Query Optimization For OLAP-XML Federations Dennis Pedersen Karsten Riiis Torben Bach Pedersen Nykredit Center for Database Research Department of Computer Science, Aalborg University

DOLAP 2002 Workshop# 2 November 8, 2002 Overview Motivation OLAP-XML federations Federation system Inlining Optimizing XML retrieval Caching and prefetching Experimental results Related work Conclusion and future work

DOLAP 2002 Workshop# 3 November 8, 2002 Motivation OLAP-systems are good for complex analysis queries Easy-to-use Fast Business, science... More and more data is needed for the analyses Problems with physical integration in existing OLAP systems Integrating new data requires (partial) cube rebuild => too slow Problems arise with Short term and varying data needs  Demographic info, disease info... Dynamic data  Stock quotes, competitors prices, disease info... Data with limited access  Product information, public databases, scientific data banks

DOLAP 2002 Workshop# 4 November 8, 2002 Motivation Data will often (in the future) be available in XML format Info syndicators, public institutions,… Scientific sources: SWISS-PROT, TrEMBL, GenBank,… Goal: flexible access to XML data from OLAP systems Aalborg University Town Sonofon Nokia … Hierarchical Elements Start-tag, end-tag Describes the semantics of the data Irregular structure

DOLAP 2002 Workshop# 5 November 8, 2002 OLAP-XML Federation Using of external XML data as virtual dimensions δ: decoration (add extra dimension) π: generalized projection (grouping) σ: selection Loosely-coupled federation Data stored in the ”best” place Supports ad-hoc integration Rapid prototyping of DWs High degree of autonomy Data is always up-to-date Performance problem? SQL XM language SELECT City[ALL]/Category, SUM(Price) FROM Sales GROUP BY City[ALL]/Category WHERE City[ALL]/Population>200,000

DOLAP 2002 Workshop# 6 November 8, 2002 OLAP-XML Federation Aalborg 160,000 University Town Sonofon Nokia Time T Year Quarter Month Products T Family Brand Product Stores T Country City Store Amount, Price

DOLAP 2002 Workshop# 7 November 8, 2002 OLAP-XML Federation Aalborg 160,000 University Town Sonofon Nokia Time T Year Quarter Month Products T Family Brand Product Stores T Country City Store Amount, Price Link

DOLAP 2002 Workshop# 8 November 8, 2002 Prototype Architecture SQL XM queries handled by Federation Manager (FM) XML Data stored in XML Component (Tamino) OLAP data stored in OLAP Component (MS Analysis) Link, Temp, and Meta-data stored in RDBMS FM fetches relevant data from components and processes SQL XM queries using Temp data

DOLAP 2002 Workshop# 9 November 8, 2002 Query Processing and Optimization Naive strategy Fetch necessary XML data and store in Temp DB Fetch necessary OLAP data and store in Temp DB Join XML and OLAP data + perform XML-specific part of SQL XM query Problem: too much data transferred to Temp DB Rule-based optimization Based on algebraic transformation rules Partition query tree into OLAP part, XML part, and Temp part Push as much evaluation to components as possible (always valid) Cost-based optimization Inline literal XML data values in OLAP queries Efficient fetch of XML data over XPath interfaces Caching/Prefetching in this particular setting Focus of this paper

DOLAP 2002 Workshop# 10 November 8, 2002 Overview Motivation OLAP-XML federations Federation system Inlining Optimizing XML retrieval Caching and prefetching Experimental results Related work Conclusion and future work

DOLAP 2002 Workshop# 11 November 8, 2002 Inlining Problem: decorations used for selections are expensive A lot of low-level OLAP data must be transferred to Temp Solution: inlining Get level expression predicate data from XML component Transform predicate to refer to literal data values Allows selection+aggregation to take place in OLAP component Predicate length will be linear/quadratic in no of dim values Queries may get too long Split into partial queries Inlining is particularly effective for OLAP systems Allows more aggregation in OLAP component

DOLAP 2002 Workshop# 12 November 8, 2002 Inlining Problem: what predicates to inline? Exponential number of possibilities Theorem: choosing the best inlining strategy is NP-hard No obvious best choice Often, an exhaustive search is feasible Otherwise, two heuristics are used Top down (see right above) Start from no inlining and add Will often fail Bottom up Start with all inlined and remove Less likely to fail General strategy Generate all bottom up, except if cost goes up very much

DOLAP 2002 Workshop# 13 November 8, 2002 Optimizing XML Retrieval XML component queries retrieve (locator,level exp) tables Example tuple: (Aalborg, 160,000) Easy to do in XQuery, but not in XPath XPath can fetch only entire existing nodes Many systems provide only an XPath interface One query for every dimension value? Often too much overhead because of many queries Combining with OR/IN not possible as XPath result is unordered Common parent of locator and level exp can be fetched Larger data volume may have to be returned Data for several level exps may be fetched together by combining XPath expressions Algorithm for doing this in full paper Cost analysis chooses between these options

DOLAP 2002 Workshop# 14 November 8, 2002 Caching and pre-fetching General technique specialized for this particular domain Caching/prefetching not feasible for very dynamic data Given a new query tree the cache is checked Identical query tree found: use cache (always fastest) Lower part of query tree found: compare cost of using cache to not using it (not using cache may be faster due to OLAP pre-aggregation) Caching and prefetching is done for component queries only Entire federation queries take up too much space and has too little probability of reuse XML cache Raw XML data + temporary XML mapping tables OLAP cache Intermediate OLAP result tables LRU cache replacement strategy used

DOLAP 2002 Workshop# 15 November 8, 2002 Cost Model Federation cost model Parallel exec of queries 1-4 OLAP waits for 3+4 (inlined) OLAP query not split Results combined in Temp OLAP cost model Cost OLAP = t OLAPOH + t OLAPEval + t OLAPTrans Based on network+disk rate, selectivity, fact size, rollup fraction, cube size, and possibilities for using pre-aggregation XML cost model Based on assumption of constant data rate from a given document Cost XML = t XMLOH + t XMLProc Temp component used standard relational cost model All models adapt based on real execution times Details + parameter estimation in another paper XML 1 XML 2 XML 3 XML 4 OLAP TEMP Time

DOLAP 2002 Workshop# 16 November 8, 2002 Experiments Test data 50 MB TPC-H OLAP, 10MB preagg, 10 MB cache, 3 MB XML, Behavior of different query types Decoration, grouping, selection Temp time generally short compared to OLAP and XML time OLAP time generally highest Inlining Compared three cases (no, simple, and cost-based inlining) Simple inlining 5 times faster than no inlining Cost-based inlining  3 times faster than simple inlining  15 times faster than no inlining Caching/prefetching Remote data versus all data stored locally Local evaluation 4-25 times faster than remote

DOLAP 2002 Workshop# 17 November 8, 2002 Experiments Another experiment compared: Our solution (flexibility optimal) Physical integration (performance optimal) Data 1GB fact data based on TPC-H 10MB XML data 100 MB pre-aggregated data Decoration, grouping, selection queries All optimizations used For ”typical” queries The overhead of our solution was only 25-50% Some queries will perform badly

DOLAP 2002 Workshop# 18 November 8, 2002 Related Work Generic data integration Relational, XML, semi-structured, OO,… + combinations Do not consider OLAP DB properties such as automatic aggregation, dimension hierarchies and correct aggregation Query processing and optimization for: Data warehousing/OLAP Federated, distributed, and multi-databases Heterogeneous data XML and semi-structured data Does not address the special case of a federated OLAP environment OLAP-object federations (own work) Simple form of inlining Does not consider cost-based optimization

DOLAP 2002 Workshop# 19 November 8, 2002 Conclusion and Future Work OLAP handles schema changes and dynamic data poorly Solution OLAP-XML federations Cost-based optimization needed Three techniques presented Inlining Optimized XML retrieval Caching and prefetching Made OLAP-XML federation perform very well Technology being implemented by OLAP query tool vendor Future work Further experimentation Capturing document order Getting measures from XML data