Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Slides:



Advertisements
Similar presentations
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Advertisements

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.
Module 17 Tracing Access to SQL Server 2008 R2. Module Overview Capturing Activity using SQL Server Profiler Improving Performance with the Database Engine.
David Chu--UC Berkeley Amol Deshpande--University of Maryland Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei Hong--Arched Rock Corp. Approximate.
Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat.
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis 1.
Samuel Madden MIT CSAIL Director, Intel ISTC in Big Data Schism: Graph Partitioning for OLTP Databases in a Relational Cloud Implications for the design.
Toolbox Mirror -Overview Effective Distributed Learning.
Technical Architectures
Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.
Traffic Engineering With Traditional IP Routing Protocols
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Shuchi Chawla, Carnegie Mellon University Static Optimality and Dynamic Search Optimality in Lists and Trees Avrim Blum Shuchi Chawla Adam Kalai 1/6/2002.
Dynamic Topology Adaptation of Virtual Networks of Virtual Machines Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab Department of Computer.
Passage Three Introduction to Microsoft SQL Server 2000.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
Detection and Resolution of Anomalies in Firewall Policy Rules
© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
Evaluating Impact of Storage on Smartphone Energy Efficiency David T. Nguyen.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
September 2011Copyright 2011 Teradata Corporation1 Teradata Columnar.
Module 19 Managing Multiple Servers. Module Overview Working with Multiple Servers Virtualizing SQL Server Deploying and Upgrading Data-Tier Applications.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Module 8: Implementing the Placement of Domain Controllers.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
Module 10 Administering and Configuring SharePoint Search.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Xiaodan Wang Department of Computer Science Johns Hopkins University Processing Data Intensive Queries in Scientific Database Federations.
Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise.
Xiaodan Wang, Randal Burns Department of Computer Science Johns Hopkins University Tanu Malik Cyber Center Purdue University LifeRaft: Data-Driven, Batch.
Workshop on Networking Meets Databases (NetDB’07) Throughput-Optimized, Global-Scale Join Processing in Scientific Federations Xiaodan Wang, Randal Burns,
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Trusted Passages: Managing Trust Properties of Open Distributed Overlays Faculty: Mustaque Ahamad, Greg Eisenhauer, Wenke Lee and Karsten Schwan PhD Students:
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Your Online Exchange for buying and selling remnants or odd sizes of production materials Trina L. Anderson UC College of Applied Science December 2003.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Hopkins Storage Systems Lab, Department of Computer Science Caching and Result-Size Estimation for Scientific Database Federations Randal Burns Johns Hopkins.
Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis.
Computer Science Department 1 Studying the Impact of More Complete Server Information on Web Caching Craig E. Wills and Mikhail Mikhailov Worcester Polytechnic.
Department of Computer Science Johns Hopkins University Xiaodan Wang Advisor: Randal Burns Processing Data-Intensive Queries in Petabyte-Scale Scientific.
From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Table General Guidelines for Better System Performance
A Black-Box Approach to Query Cardinality Estimation
Parallel Programming By J. H. Wang May 2, 2017.
Probabilistic Data Management
Collaborative Offloading for Distributed Mobile-Cloud Apps
Automatic Physical Design Tuning: Workload as a Sequence
Adaptive Cache Mode Selection for Queries over Raw Data
An Adaptive Middleware for Supporting Time-Critical Event Response
Table General Guidelines for Better System Performance
Admission Control and Request Scheduling in E-Commerce Web Sites
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Autonomic Workload Execution Control Using Throttling
Data Placement Problems in Database Applications
CS122B: Projects in Databases and Web Applications Spring 2018
Resource Allocation for Distributed Streaming Applications
Performance And Scalability In Oracle9i And SQL Server 2000
CS122B: Projects in Databases and Web Applications Winter 2018
Presentation transcript:

Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University D. Dash, A. Ailamaki Carnegie Mellon University

Hopkins Storage Systems Lab, Department of Computer Science Outline Target Application: Proxy caches for SkyQuery Physical Design in Proxy Caches – Need for vertical partitioning – Workload evolution Online Vertical Partitioning – Simple scenario: Two configuration – General scenario: N configurations Experiments

Hopkins Storage Systems Lab, Department of Computer Science SkyQuery Publicly accessible federation of sky surveys (a virtual telescope with terabyte data sets) Autonomous, heterogeneous, and geographically distributed sites Data intensive, read-only workload Scaling through proxy caching – Minimize network traffic – Offload query processing

Hopkins Storage Systems Lab, Department of Computer Science Bypass Caching (Malik et al., ICDE’05) Proxy database cache for SkyQuery – Brings columns closes to users – Economic model for bypassing queries

Hopkins Storage Systems Lab, Department of Computer Science The Need for Vertical Partitioning Poor I/O performance in the cache – Mirrors the backend DB design Largest relations groups 446 columns – Index-free environment Auxiliary data structures (indices/views) pollute cache Offsets response time benefits from network savings – 6x benefit with partitioning alone Performance without redundant data

Hopkins Storage Systems Lab, Department of Computer Science Is Partitioning Feasible?

Hopkins Storage Systems Lab, Department of Computer Science Why Not Existing Solutions? (ie. DB tuning advisor, Autopart) Require representative workloads – Not readily available – Astronomy workloads exhibit evolution Offline in nature – Invoked periodically – Costly to run – Ignore the cost of partitioning Static design – Output a single configuration for the input workload – Ignores incremental changes within the workload

Hopkins Storage Systems Lab, Department of Computer Science Workload Evolution

Hopkins Storage Systems Lab, Department of Computer Science Workload Evolution

Hopkins Storage Systems Lab, Department of Computer Science Online Vertical Partitioning Problem

Hopkins Storage Systems Lab, Department of Computer Science Two Configuration Scenario Algorithm: Given current config C and an alternative C’, transition if remaining in C incurs substantial overhead Capturing overhead – Penalty : – Cumulative Penalty : – Max cumulative penalty : Transition if 3-competitive – After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C)) – OPT incurs at least (k/2)(d(C,C’)+d(C’,C))

Hopkins Storage Systems Lab, Department of Computer Science NConf: Extending to N-Configurations Let C y be the current config., C x be the previous config., transition to the first C z (C z ≠C y ) satisfying: Number of configurations is exponential (51 trillion ways to partition 20 attributes) Pruning heuristics – Neighboring configurations – Attribute Groups

Hopkins Storage Systems Lab, Department of Computer Science Neighboring Configurations A4 A3A1A2 A4A3A2A1 A3 A1A2A4A1A3A4A2 Neighbors of C y Curr. Config: C y Small, incremental partitions Lower threshold to overcome

Hopkins Storage Systems Lab, Department of Computer Science Attribute Groups A4 A3A1A2 Curr. Config: C y q k : {A1,A3,A4} Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4} A4A1A2A3 A1A2A4A3A4A1A2A1A2A3A4 weight+=q k (C y ) Candidate config if n.weight > d(C x, C y )+d(C y,n) Candidates with high weight benefits from repartitioning

Hopkins Storage Systems Lab, Department of Computer Science Experiments TPC benchmark in SQL Server 2000 Partition orders relation using select queries Two 10k workloads – Wkld Sky : Evolving access pattern that approximates SkyQuery – Wkld Const : Access pattern remains unchanged AutoPart: an offline partitioning tool ( Papadomanolakis et al. )

Hopkins Storage Systems Lab, Department of Computer Science Query Performance

Hopkins Storage Systems Lab, Department of Computer Science Estimated I/O and Transitions

Hopkins Storage Systems Lab, Department of Computer Science Future Work Impact of cache replacement policy – Database state change periodically – Reuse work to find new partitions Scaling to SkyQuery with thousands of attributes Fast techniques for cost estimation Integration of index selection in caches

Hopkins Storage Systems Lab, Department of Computer Science Conclusion Proxy caches present a dynamic environment Vertical partitioning improves performance without adding redundant data Online vertical partitioning – Balances query execution performance with cost of transitioning Experiments show 17% improvement by partitioning a single table

Hopkins Storage Systems Lab, Department of Computer Science Questions ???