By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE 6339 - Data Exploration.

Slides:



Advertisements
Similar presentations
Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Advertisements

A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS 540 Database Management Systems
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Online Aggregation Liu Long Aggregation Operations related to aggregating data in DBMS –AVG –SUM –COUNT.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Lecture 24: Query Execution Monday, November 20, 2000.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
CSCE Database Systems Chapter 15: Query Execution 1.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
CS4432: Database Systems II Query Processing- Part 2.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
15.1 – Introduction to physical-Query-plan operators
Ripple Joins for Online Aggregation
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Evaluation of Relational Operations
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Spatial Online Sampling and Aggregation
Database Systems Ch Michael Symonds
Relational Operations
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
One-Pass Algorithms for Database Operations (15.2)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration

Overview What the paper is all about Traditional Algorithms Online Aggregation Ripple Joins: Introduction How different is Ripple join Ripple Join variants Aspect ratios Future Work 9/28/20102CSE Data Exploration

What the paper is about.. The paper talks about a class of join algorithms called Ripple joins for the online processing of multi-table aggregation queries. This paper tells how to join a bunch of tables and get the SUM, COUNT, or AVG in GROUP BY clauses showing approximate results immediately and the confidence interval of the results from the first few tuples retrieved. 9/28/20103CSE Data Exploration

Traditional Algorithms Traditional algorithms take a lot of time since they have to process the entire tables or relations The users have to wait for a long time before the results are returned. An better method is Online Aggregation. 9/28/20104CSE Data Exploration

Online Aggregation A running estimate of the final aggregates are continuously displayed to the user. Quick results rather than minimize time for completion. The proximity of the running estimate to the final result is also displayed to the user.(confidence interval). 9/28/2010CSE Data Exploration5

GUI 9/28/2010CSE Data Exploration6

Ripple Joins: Introduction Generalize the traditional block nested loops and hash joins. Non blocking Square ripple join – samples are drawn at the same rate Rectangular ripple join – samples out one relation at a higher rate than another. 9/28/2010CSE Data Exploration7

Ripple Join: Introduction Typical query forms SELECT op(expression) FROM R 1, R 2, …, R K WHERE predicate GROUP BY columns; 9/28/2010CSE Data Exploration8

How different is Ripple join? Traditional hash join blocks until the entire query output is finished. Ripple join reports approximate results after each sampling step, and allows user intervention. In the inner loop, an entire table is scanned. Ripple join expands the sample set incrementally. Ripple joins avoid complete scan of the relations. 9/28/20109CSE Data Exploration

How Ripple Join works.. Assume ripple join of relations R and S  Select a random tuple r from R.  Join with previously selected S tuples.  Select a random tuple s from S.  Join with previously selected R tuples.  Join r and s. 9/28/201010CSE Data Exploration

Ripple Join: Square two table join 9/28/2010CSE Data Exploration11 R S X N = 1

9/28/2010CSE Data Exploration12 R S X X X N = 2

9/28/2010CSE Data Exploration13 R S X X X X X X N = 3

Ripple Join Algorithm For(max=1 to infinity) { for(i=1 to max-1) if(predicate(R[i],s[max])) output(R[i],S[max]); for(i=1 to max) if(predicate(R[max],s[i])) output(R[max],S[i]); } 9/28/201014CSE Data Exploration

Ripple Join Iterator An iterator based DBMS invokes an iterator’s next() method each time an output tuple is needed. The iterator needs to store the next position to be fetched from each of its inputs R and S. 9/28/2010CSE Data Exploration15

Pipelining Can easily be pipelined for multiple binary joins Cannot do three-table joins as two binary ripple joins. 9/28/201016CSE Data Exploration

Ripple Join Variants Block Ripple Join Hash Ripple Join Index Ripple Join 9/28/2010CSE Data Exploration17

Block Ripple Join Takes disk blocks of R and S in turn (not tuples) Read a disk block of R and scan against old S Evict from memory Read Block of S and compare with older R tuples. Has I/O saving since each block is taken out at a time. 9/28/2010CSE Data Exploration18

Index and Hash Ripple Joins Index Ripple Join Identical to indexed-enhanced nested loop join Hash Ripple Join Used only for Equijoin queries. 9/28/2010CSE Data Exploration19

Statistical Considerations Goal-to provide efficient, accurate, interactive estimation. Estimator unbiased, consistent Running average is biased but consistent Capable of giving tight confidence intervals 9/28/2010CSE Data Exploration20

Aspect Ratios Aspect ratio: how many tuples are retrieved from each base relation per sampling step. e.g.β 1 = 1, β 2 = 3, … Ripple join adjusts the aspect ratio according to the sizes of the base relations. 9/28/2010CSE Data Exploration21

Why is it called Ripple Join? 9/28/2010CSE Data Exploration22 1. The algorithm seems to ripple out from a corner of the join. 2. Acronym: "Rectangles of Increasing Perimeter Length"

Performance 9/28/2010CSE Data Exploration23

Conclusions and Future Work Complete implementation of online aggregation must be able to handle multi-table queries. This paper introduces ripple joins, a family of join algorithms designed to meet the performance needs of online aggregation system. 9/28/2010CSE Data Exploration24

Though ripple joins are symmetric, it is still not clear how a query optimizer should choose among the ripple join variants, nor how it should order a sequence of ripple joins. 9/28/2010CSE Data Exploration25

References Haas & Hellerstein, “Ripple Joins for Online Aggregation” (SIGMOD ’99) Haas & Hellerstein, “Online Query Processing: A Tutorial” P. J Haas, J.M Hellerstein and H.J Wang Online aggregation. In Proc ACM SIGMOD Intl Conf. Management of data pages. 9/28/2010CSE Data Exploration26