Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.

Similar presentations


Presentation on theme: "Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed."— Presentation transcript:

1 Lecture 11 Main Memory Databases Midterm Review

2 Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed code for locking, logging, and disk I/O Remaining useful work

3 Cost Sources Managing disk contents – Logging – Locking Managing memory and short-term – Latching – Buffer pool manager Solution: In-Memory, Single-Threaded Execution Model

4 What about concurrency? Need to make of multi-core But want to minimize overhead Solution: Partition the data, most xactions are single-partition

5 TPC-C Most queries go to one warehouse

6 Midterm Review Schema design Relational Algebra DB Architecture Buffer Pools Indexing Join Algos Query optimization

7 Entity-Relationship Diagram SSNNameAddressHobbyCost 123johnmain stdolls$ 123johnmain stbugs$ 345marylake sttennis$$ 456joefirst stdolls$ “Wide” schema – has redundancy and anomalies in the presence of updates, inserts, and deletes Table key is Hobby, SSN Person Hobby SSN Address Name Cost n:n Entity Relationship Diagram

8 BCNFify Start with one "universal relation” While some relation R is not in BCNF Find a FD F=X  Y that violates BCNF on R Split R into R1 = (X U Y), R2 = R – Y

9 Relational Algebra Projection π(R,c1, …, cn) = π c1…cn R select a subset c1 … cn of columns of R Selection σ(R, pred) = σ pred R select a subset of rows that satisfy pred Cross Product (||R|| = #attrs in R, |R| = #rows in row) R1 X R2 (aka Cartesian product) combine R1 and R2, producing a new relation with ||R1|| + ||R2|| attrs, |R1| * |R2| rows Join (R1, R2, pred) = R1 pred R2 = σ pred (R1 X R2)

10 Database Internals Overview Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter (parse tree) Planner & Optimizer (query plan) Executor Query System Storage System Access Methods Lock Manager Buffer Pool Log Manager

11 Flattening Example SELECT emp.* FROM emp WHERE EXISTS ( SELECT * FROM dept WHERE emp.deptNo = dept.deptNo AND dept.building = ‘Tech’); SELECT emp.* FROM emp, dept WHERE emp.deptNo = dept.deptNo AND dept.building = ‘Tech’; Select employees in a department located in the Tech:

12 B+ Tree Indexes Balanced wide tree Fast value lookup and range scans Each node is a disk page (except root) Leafs point to tuple pages

13 Study Break: B+ Tree Build a B+ Tree with values (8, 17, 21, 2, 8, 21, 25, 3, 19, 7) and 4 node slots Insert 9 into the tree Insert 3 into the tree Delete 8 from tree

14 Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)-- / O(P) O( log B n + R ) LookupO(P)O(C)O(1)O( log B n ) n : number of tuples P : number of pages in file B : branching factor of B-Tree (keys / node) R : number of pages in range C: cardinality (#) of unique values on key

15 Buffer Pool Management Eviction strategies – Least recently used (LRU) – Most recently used (MRU) Manager policies – Work by file instance – capture access pattern and table usage – Memory allotment depends on access method

16 Join Algorithms Summary Sort-MergeSimple HashGrace Hash I/O: 3 (|R| + |S|) CPU: O(P x {S}/P log {S}/P) I/O: P (|R| + |S|) CPU: O({R} + {S}) I/O: 3 (|R| + |S|) CPU: O({R} + {S}) Notation: P partitions / passes over data; assuming hash is O(1) Grace hash is generally a safe bet, unless memory is close to size of tables, in which case simple can be preferable Extra cost of sorting makes sort merge unattractive unless there is a way to access tables in sorted order (e.g., a clustered index), or a need to output data in sorted order (e.g., for a subsequent ORDER BY)

17 Query Planning Cost Estimation Use analytical cost to estimate time needed for a query execution plan tree Selectivity (fraction of tuples returned from input): – col = value: 1/ICARD– 1/nth of # of unique col values, 1/10 if no index – col > value: (value – max) / (max – min) or 1/3 – col1 = col2: 1/max(ICARD(c1), ICARD(c2)) or 1/10

18 Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,... n (in that order) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC} R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join optjoin(S-a) to a + min. access cost for a Precomputed in previous iteration!


Download ppt "Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed."

Similar presentations


Ads by Google