1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates.

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

A PERFORMANCE BASED GLOBAL AIR NAVIGATION SYSTEM: PART II
Yinyin Yuan and Chang-Tsun Li Computer Science Department
© 2010 The MITRE Corporation. All Rights Reserved. Sector Capacity Prediction for Traffic Flow Management Lixia Song April 13 th, 2010.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
On-line Index Selection for Physical Database Tuning
Author: Chengchen, Bin Liu Publisher: International Conference on Computational Science and Engineering Presenter: Yun-Yan Chang Date: 2012/04/18 1.
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam.
Document number Finding Financial Solutions & Models for Microgrids Maryland Clean Energy Summit Panel Wednesday, October 16, 2013.
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
Dynamic adaptation of parallel codes Toward self-adaptable components for the Grid Françoise André, Jérémy Buisson & Jean-Louis Pazat IRISA / INSA de Rennes.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Amdb: An Access Method Debugging and Analysis Tool Marcel Kornacker, Mehul Shah, Joe Hellerstein UC Berkeley.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
1 Logically Centralized? State Distribution Trade-offs in Software Defined Networks Written By Dan Levin, Andreas Wundsam, Brandon Heller, Nikhil Handigol,
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Disaster Recovery as a Cloud Service Chao Liu SUNY Buffalo Computer Science.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
27 June Metrics and Performance Management Session Report Alison Hudgell.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Software Measurement & Metrics
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Strategy and Regulatory Frameworks
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
A Theoretical Framework for Adaptive Collection Designs Jean-François Beaumont, Statistics Canada David Haziza, Université de Montréal International Total.
Statistics Canada Statistique Canada Cost-Efficient Framework for Data Collection for CATI Surveys Social Surveys Collection Research Steering Committee.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Methodology – Physical Database Design for Relational Databases.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Thrust IIB: Dynamic Task Allocation in Remote Multi-robot HRI Jon How (lead) Nick Roy MURI 8 Kickoff Meeting 2007.
© Vipin Kumar IIT Mumbai Case Study 2: Dipoles Teleconnections are recurring long distance patterns of climate anomalies. Typically, teleconnections.
Negotiation & Challenge Problem meet Complexity & Dynamics Purview: interaction between dynamics & complexity teams and challenge problem teams –Can CP.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Paul Beraud, Alen Cruz, Suzanne Hassell, Juan Sandoval, Jeffrey J Wiley November 15 th, 2010 CRW’ : NETWORK MANEUVER COMMANDER – Resilient Cyber.
Improve query performance with the new SQL Server 2016 query store!! Michelle Gutzait Principal Consultant at
This is a customer facing presentation that you can use to present the your assessment findings, recommendations, options, benefits and requirements. This.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz.
AUTONOMIC COMPUTING B.Akhila Priya 06211A0504. Present-day IT environments are complex, heterogeneous in terms of software and hardware from multiple.
Introduction to Machine Learning, its potential usage in network area,
Practical Database Design and Tuning
Kyriaki Dimitriadou, Brandeis University
Introduction to Design Patterns
Personalized Social Image Recommendation
Methodology – Physical Database Design for Relational Databases
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Practical Database Design and Tuning
Modeling Architectures
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Self-organizing Tuple Reconstruction in Column-stores
Presentation transcript:

1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates are a dominant element Issue: Indices are crucial for ensuring efficiency Problem: Select a good set of indices Current approaches are off-line A representative workload is necessary Not suitable for dynamic environments

2. Our Solution: COLT COLT: Continuous On-Line Tuning Monitors the workload continuously Selects indices on-line based on the latest traits Features: Tight coupling with the query optimizer Adaptive allocation of profiling resources Self-regulated tuning Controllable overhead Generic framework, applicable to many domains

3. System Architecture Notifies COLT of new queries Provides What-If interface Operates continuously Measures index benefit through the What-If optimizer Activated periodically Selects the indices to materialize Sets up candidates for profiling Determines profiling budget

4. Index Organization Cold Indices that are not promising Benefit is measured with a crude metric A candidate index always starts Cold Hot Promising indices that are candidates for materialization Benefit is measured accurately with what-if calls Materialized Indices that COLT has materialized Benefit is measured accurately with reverse what-if calls

5. Profiler 1.Profiler is notified of new query arrival 2.Profiler updates benefit metric for cold indices 3.Profiler selects hot and materialized indices for profiling 4.Optimizer returns measured what-if gains 5.Profiler updates benefit metric for profiled indices

6. Profiling Model Queries are grouped in clusters A cluster represents queries with similar traits Clustering captures redundancy in the workload An index is selected for profiling if: It is relevant to the cluster of the current query Its past what-if measurements in the cluster have high variance Its potential benefit is high Selection method is based on adaptive sampling Goal: Maximize the value of each what-if call

7. Self Organizer Activated at the end of an epoch Determines new materialized set from H and M Indices are compared based on predicted benefit Promotes promising cold indices to H Indices are promoted based a 2-cluster grouping Stay in ColdPromoted to Hot Cold index benefit

8. Predicted Index Benefit Prediction is based on measured benefit Predicted benefit is high if the past benefit is consistently high Resilience to noise Cost of materialization is discounted ObservationPrediction benefit time

9. Adaptive Tuning Profiling budget: max # of What-If calls per epoch Budget is reset at the end of each epoch Budget increases if hot indices can improve performance, decreases otherwise Hot index potential is based on statistical models The end result is that self-tuning is: Intensified, when the workload shifts Suspended, when the system is well tuned

10. Performance of COLT COLT adapts to each phase of the workload Better performance compared to Offline tuning Offline: Optimal for complete workload Workload of 4 phases Each phase has 300 queries There are 50 transition queries between phases

11. Overhead of COLT Spikes occur at transitions between phases Overhead decreases when the system is well tuned

12. The Demonstration COLT inside PostgreSQL Live operation as queries arrive Demonstration of COLT internals Set-up: Simulated exploration of a TPC-H like data set Two workloads with distinct query distributions Workload noise Highlights: On-line tuning to different workloads Resilience of COLT against noise