Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
6.830/6.814 Lecture 3 Sam Madden Relational Algebra and Normalization Sept 10, 2014.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Lecture 8 Join Algorithms. Intro Until now, we have used nested loops for joining data – This is slow, n^2 comparisons How can we do better? – Sorting.
CS 540 Database Management Systems
6.830/6.814 Lecture 5 Database Internals Continued September 17, 2014.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
Lecture 10 Query Optimization II Automatic Database Design.
6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.
Hobby Schema SSNNameAddressHobbyCost 123johnmain stdolls$ 123johnmain stbugs$ 345marylake sttennis$$ 456joefirst stdolls$ “Wide” schema – has redundancy.
EECS 339 Lecture 2 Schema Design, Relational Algebra Jennie Duggan January 13, 2014.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Query Processing & Optimization
Chapter 19 Query Processing and Optimization
Access Path Selection in a Relation Database Management System (summarized in section 2)
EECS 339 Lecture 3 Normalization Database Internals.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Database Management 9. course. Execution of queries.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
Copyright © Curt Hill Query Evaluation Translating a query into action.
Lecture 9 Query Optimization.
Hobby Schema SSNNameAddressHobbyCost 123johnmain stdolls$ 123johnmain stbugs$ 345marylake sttennis$$ 456joefirst stdolls$ “Wide” schema – has redundancy.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Lecture 5 Cost Estimation and Data Access Methods.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
6.830 Lecture 6 9/28/2015 Cost Estimation and Indexing.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 5 Index and Clustering
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Relational Algebra p BIT DBMS II.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Query Processing – Implementing Set Operations and Joins Chap. 19.
CS 540 Database Management Systems
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Introduction to Query Optimization
Evaluation of Relational Operations
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
External Joins Query Optimization 10/4/2017
Selected Topics: External Sorting, Join Algorithms, …
Relational Algebra and Normalization 9/13/2017
Chapter 13: Data Storage Structures
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Chapter 13: Data Storage Structures
Evaluation of Relational Operations: Other Techniques
Chapter 13: Data Storage Structures
Presentation transcript:

Lecture 11 Main Memory Databases Midterm Review

Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed code for locking, logging, and disk I/O Remaining useful work

Cost Sources Managing disk contents – Logging – Locking Managing memory and short-term – Latching – Buffer pool manager Solution: In-Memory, Single-Threaded Execution Model

What about concurrency? Need to make of multi-core But want to minimize overhead Solution: Partition the data, most xactions are single-partition

TPC-C Most queries go to one warehouse

Midterm Review Schema design Relational Algebra DB Architecture Buffer Pools Indexing Join Algos Query optimization

Entity-Relationship Diagram SSNNameAddressHobbyCost 123johnmain stdolls$ 123johnmain stbugs$ 345marylake sttennis$$ 456joefirst stdolls$ “Wide” schema – has redundancy and anomalies in the presence of updates, inserts, and deletes Table key is Hobby, SSN Person Hobby SSN Address Name Cost n:n Entity Relationship Diagram

BCNFify Start with one "universal relation” While some relation R is not in BCNF Find a FD F=X  Y that violates BCNF on R Split R into R1 = (X U Y), R2 = R – Y

Relational Algebra Projection π(R,c1, …, cn) = π c1…cn R select a subset c1 … cn of columns of R Selection σ(R, pred) = σ pred R select a subset of rows that satisfy pred Cross Product (||R|| = #attrs in R, |R| = #rows in row) R1 X R2 (aka Cartesian product) combine R1 and R2, producing a new relation with ||R1|| + ||R2|| attrs, |R1| * |R2| rows Join (R1, R2, pred) = R1 pred R2 = σ pred (R1 X R2)

Database Internals Overview Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter (parse tree) Planner & Optimizer (query plan) Executor Query System Storage System Access Methods Lock Manager Buffer Pool Log Manager

Flattening Example SELECT emp.* FROM emp WHERE EXISTS ( SELECT * FROM dept WHERE emp.deptNo = dept.deptNo AND dept.building = ‘Tech’); SELECT emp.* FROM emp, dept WHERE emp.deptNo = dept.deptNo AND dept.building = ‘Tech’; Select employees in a department located in the Tech:

B+ Tree Indexes Balanced wide tree Fast value lookup and range scans Each node is a disk page (except root) Leafs point to tuple pages

Study Break: B+ Tree Build a B+ Tree with values (8, 17, 21, 2, 8, 21, 25, 3, 19, 7) and 4 node slots Insert 9 into the tree Insert 3 into the tree Delete 8 from tree

Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)-- / O(P) O( log B n + R ) LookupO(P)O(C)O(1)O( log B n ) n : number of tuples P : number of pages in file B : branching factor of B-Tree (keys / node) R : number of pages in range C: cardinality (#) of unique values on key

Buffer Pool Management Eviction strategies – Least recently used (LRU) – Most recently used (MRU) Manager policies – Work by file instance – capture access pattern and table usage – Memory allotment depends on access method

Join Algorithms Summary Sort-MergeSimple HashGrace Hash I/O: 3 (|R| + |S|) CPU: O(P x {S}/P log {S}/P) I/O: P (|R| + |S|) CPU: O({R} + {S}) I/O: 3 (|R| + |S|) CPU: O({R} + {S}) Notation: P partitions / passes over data; assuming hash is O(1) Grace hash is generally a safe bet, unless memory is close to size of tables, in which case simple can be preferable Extra cost of sorting makes sort merge unattractive unless there is a way to access tables in sorted order (e.g., a clustered index), or a need to output data in sorted order (e.g., for a subsequent ORDER BY)

Query Planning Cost Estimation Use analytical cost to estimate time needed for a query execution plan tree Selectivity (fraction of tuples returned from input): – col = value: 1/ICARD– 1/nth of # of unique col values, 1/10 if no index – col > value: (value – max) / (max – min) or 1/3 – col1 = col2: 1/max(ICARD(c1), ICARD(c2)) or 1/10

Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,... n (in that order) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC} R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join optjoin(S-a) to a + min. access cost for a Precomputed in previous iteration!