April 20th – RDBMS Internals

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS 540 Database Management Systems
Query Optimization Goal: Declarative SQL query
CS186 Final Review Query Optimization.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
CSCE Database Systems Chapter 15: Query Execution 1.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
Query Optimization Heuristic Optimization
CS 540 Database Management Systems
UNIT 11 Query Optimization
CS 440 Database Management Systems
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
Evaluation of Relational Operations
Overview of Query Optimization
Cse 344 February 14th – Indexing.
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Cse 344 April 25th – Disk i/o.
February 12th – RDBMS Internals
April 6th – relational algebra
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Cse 344 APRIL 23RD – Indexing.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
External Joins Query Optimization 10/4/2017
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Query Optimization CS 157B Ch. 14 Mien Siao.
April 27th – Cost Estimation
Chapters 15 and 16b: Query Optimization
Relational Query Optimization
Overview of Query Evaluation
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Relational Algebra Friday, 11/14/2003.
Overview of Query Evaluation
Lecture 24: Query Execution
Chapter 11 Database Performance Tuning and Query Optimization
Relational Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 23: Query Execution
Query Optimization.
Lecture 22: Query Execution
Database Applications (15-415) DBMS Internals- Part X Lecture 22, April 3, 2018 Mohammad Hammoud.
Database Systems (資料庫系統)
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Evaluation of Relational Operations: Other Techniques
Lecture 20: Query Execution
Presentation transcript:

April 20th – RDBMS Internals Cse 344 April 20th – RDBMS Internals

Administrivia OQ5 Out Datalog – Due next Wednesday HW4 Due next Wednesday Written portion (.pdf) Coding portion (one .dl file)

Today Back to RDBMS ”Query plans” and DBMS planning Management between SQL and execution Optimization techniques Indexing and data arrangement

Query Evaluation Steps SQL query Parse & Rewrite Query Logical plan (RA) Select Logical Plan Query optimization Select Physical Plan Physical plan Analogy with sorting: we have decided the ordering of operations, and now we need to choose their implementation. For instance, there are many ways to implement sort. Query Execution Disk

Logical vs Physical Plans Logical plans: Created by the parser from the input SQL text Expressed as a relational algebra tree Each SQL query has many possible logical plans Physical plans: Goal is to choose an efficient implementation for each operator in the RA tree Each logical plan has many possible physical plans

Review: Relational Algebra Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ πsname σscity=‘Seattle’ and sstate=‘WA’ and pno=2 sid = sid Relational algebra expression is also called the “logical query plan” Supplier Supply

Physical Query Plan 1 (On the fly) πsname (On the fly) Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 1 (On the fly) πsname A physical query plan is a logical query plan annotated with physical implementation details (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Nested loop) sid = sid Describe the “nested loop join”: we have not discussed physical operators in class yet, but it should be clear what a nested loop means. Supplier Supply (File scan) (File scan)

Physical Query Plan 2 (On the fly) πsname (On the fly) Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 2 (On the fly) πsname Same logical query plan Different physical plan (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Hash join) sid = sid Briefly mention that a hash join is a different implementation. Normally, most students in class should have no problem imagining what it might be, but no need to describe it in detail, we will cover it in two lectures. Supplier Supply (File scan) (File scan)

Physical Query Plan 3 (On the fly) πsname (d) (Sort-merge join) (c) Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 3 Different but equivalent logical query plan; different physical plan (On the fly) πsname (d) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Sort-merge join) (c) sid = sid (Scan & write to T1) (a) σscity=‘Seattle’ and sstate=‘WA’ Same here, briefly mention that sort-merge is yet another way to implement join. (b) σpno=2 (Scan & write to T2) Supplier Supply (File scan) (File scan)

Query Optimization Problem For each SQL query… many logical plans For each logical plan… many physical plans Next: we will discuss physical operators; how exactly are query executed?

Physical Operators Each of the logical operators may have one or more implementations = physical operators Will discuss several basic physical operators, with a focus on join

Main Memory Algorithms Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sid Supply Propose three physical operators for the join, assuming the tables are in main memory:

Main Memory Algorithms Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sid Supply Propose three physical operators for the join, assuming the tables are in main memory: Nested Loop Join O(??) Merge join O(??) Hash join O(??)

Main Memory Algorithms Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sid Supply Propose three physical operators for the join, assuming the tables are in main memory: Nested Loop Join O(n2) Merge join O(n log n) Hash join O(n) … O(n2)

BRIEF Review of Hash Tables Separate chaining: A (naïve) hash function: 1 2 3 4 5 6 7 8 9 Duplicates OK WHY ?? h(x) = x mod 10 503 103 503 Operations: 76 666 find(103) = ?? insert(488) = ?? 48

BRIEF Review of Hash Tables insert(k, v) = inserts a key k with value v Many values for one key Hence, duplicate k’s are OK find(k) = returns the list of all values v associated to the key k

Iterator Interface Each operator implements three methods: open() next() close()

Iterator Interface Example “on the fly” selection operator interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Iterator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; while (!found) { Tuple in = child.next(); if (in == EOF) return EOF; found = p(in); return in; void close () { child.close(); }

Iterator Interface Example “on the fly” selection operator interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; while (!found) { Tuple in = child.next(); if (in == EOF) return EOF; found = p(in); return in; void close () { child.close(); }

Iterator Interface Example “on the fly” selection operator interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () {

Iterator Interface Example “on the fly” selection operator interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(r); return r; void close () { child.close(); }

Iterator Interface interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } Operator q = parse(“SELECT ...”); q = optimize(q); q.open(); while (true) { Tuple t = q.next(); if (t == null) break; else printOnScreen(t); } q.close(); Query plan execution

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 open() (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 open() (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). open() Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 open() (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). open() open() Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. next() Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. next() next() Suppliers Supplies (File scan) (File scan)

Discuss: open/next/close for nested loop join Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. next() next() next() Suppliers Supplies (File scan) (File scan)

Tuples from here are pipelined Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Hash Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Tuples from here are pipelined Suppliers Supplies (File scan) (File scan)

Pipelining Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 Tuples from here are “blocked” (Hash Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Tuples from here are pipelined Suppliers Supplies (File scan) (File scan)

Blocked Execution Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Blocked Execution (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Merge Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)

Blocked Execution Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Blocked Execution (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Merge Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Blocked Blocked Suppliers Supplies (File scan) (File scan)

Pipelined Execution Tuples generated by an operator are immediately sent to the parent Benefits: No operator synchronization issues No need to buffer tuples between operators Saves cost of writing intermediate data to disk Saves cost of reading intermediate data from disk This approach is used whenever possible Mention that some engines process tuples in batches rather than one at a time. Others process one at a time.

Query Execution Bottom Line SQL query transformed into physical plan Access path selection for each relation Scan the relation or use an index (next lecture) Implementation choice for each operator Nested loop join, hash join, etc. Scheduling decisions for operators Pipelined execution or intermediate materialization Pipelined execution of physical plan

Recall: Physical Data Independence Applications are insulated from changes in physical storage details SQL and relational algebra facilitate physical data independence Both languages input and output relations Can choose different implementations for operators

Query Performance My database application is too slow… why? One of the queries is very slow… why? To understand performance, we need to understand: How is data organized on disk How to estimate query costs In this course we will focus on disk-based DBMSs