Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Administration

Similar presentations


Presentation on theme: "Database Administration"— Presentation transcript:

1 Database Administration
Query Processing

2 Query Processing How to evaluate this query efficiently?
SELECT A.Name, B.Grade FROM A, B WHERE A.Id = B.Id  Name, Grade (Id=Id(A  B)) How to evaluate this query efficiently? What algorithms and access path to use?

3 Query Processing in Oracle
You can view the query execution plan used in Oracle and other DBMSs

4 Explain in MySQL Simple means no union or subqueries
All means do a full scan of the table Expected # rows to examine Filter rows using where clause sorting is needed for order by

5 Explain in MySQL Matching party_index to const to fetch the rows
Expected # rows to examine decreases

6 Use primary key to retrieve the row
Explain in MySQL Matches at most 1 row Use primary key to retrieve the row

7 Use primary index to fetch row with id = 300
Explain in MySQL SELECT in Outer Query Inner subquery Use primary index to fetch row with id = 300

8 Sorting Sorting is used to implement many relational operations (e.g., join, project, intersect, …) Ordering is specifically requested by users: SELECT … ORDER BY … Eliminate duplicate tuples: SELECT DISTINCT … To do sort-merge join (which is used for JOIN, UNION, and INTERSECTION operations) Problems Relations are typically large, do not fit in main memory So we cannot use traditional in-memory sorting algorithms (such as quicksort)

9 External Sorting Combines in-memory sorting with techniques for minimizing I/O Cost of sorting is often measured in terms of number of block transfers cost of in-memory sorting << I/O cost of block transfers 2-phase in external sorting: Partial sorting phase Merging phase

10 External Sorting nb = number of input buffer space in memory
b = number of disk blocks for the file to be sorted Example: nb = 2, b = 7 2 input buffers in main memory Unsorted file on external storage (disk) run

11 Partial Sorting Partial Sorting Phase: Example: nb=2, b=7
Fetch a segment of the unsorted file from disk into buffers Sort the data in buffers (e.g., using quicksort) Write the sorted file segment back to disk Example: nb=2, b=7 Number of runs = 7/2 = 4 Unsorted data file Partially sorted file Run 1 Run 2 11 20 Run 3 Run 4

12 Merging Merging Phase:
Merge all runs using k input buffers and 1 output buffer If number of runs > k (k: degree of merging) Divide runs into groups of size k and merge each group into a run Repeat until all runs are merged into 1 group k-way merging

13 Partially sorted Input
Merging Example Partially sorted Input Output 2-way merging 2 5 3 6 10 6 3 1 7 15 2 5 10 1 7 15 2 runs (of size 2) are merged into 1 run (of size 4) Input buffers Output buffer

14 Example of External Sorting
nb = 4, b = 10, k=3 (3-way merging): After partial sorting After 3-way merging

15 Example of External Sorting
nb = 4, b = 10, k=2 (2-way merging): After partial sorting After 2-way merging After 2-way merging

16 Cost of External Sorting
Suppose we have b = 1024 blocks and nB = 5 input buffers Partial sorting: number of runs, nR = 1024/5 = 205 Merging: assume degree of merging, k = 5 After pass 1: number of runs remaining = 205/5 = 41 After pass 2: number of runs remaining = 41/5 = 9 After pass 3: number of runs remaining = 9/5 = 2 After pass 4: number of runs remaining = 2/5 = 1 Number of merging passes needed = logk nR = logk b/nB

17 Cost of External Sorting
Partial sorting : Cost = 2b (each disk block is read once and written back once) Cost of merge phase (k is degree of merging) b/nB runs to be merged Number of passes needed = log k (b/nB) Each pass requires 2b page transfers Cost of merging =  2b log k(b/nB)  = 2b log k (nR) Overall cost of external sorting = cost of partial sorting + cost of merging = 2b + 2b log k (nR)

18 Query Processing SELECT AttribList FROM relations R1, …, Rk
WHERE condition Translated to: AttribList (condition (R1  R2  …  Rk)) Example: SELECT Dno FROM Employee WHERE SSN = ‘ ’ Dno (SSN=‘ ’ (Employee))

19 Algorithms for Query Processing
Relational Algebra operators SELECT JOIN PROJECT SET (UNION, INTERSECT, SET DIFFERENCE) AGGREGATE

20 Algorithms for SELECT Operations
EMPLOYEE(SSN, Fname, Minit, Lname, Sex, Address, Salary, Dno) DEPARTMENT(Dno, Dname, Mgrssn, MgrStartDate) WORKS_ON(ESSN, Pno, Hours) Examples of SELECT operations: Simple selection (OP1): s SSN=' ' (EMPLOYEE) (OP2): s DNUMBER>5(DEPARTMENT) (OP3): s DNO=5(EMPLOYEE) Complex selection: (OP4): s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE) (OP5): s ESSN= AND PNO=10(WORKS_ON)

21 Algorithms for SELECT Operation
Algorithms for Simple Selection: Linear search (brute force, full tablescan): Retrieve every record in the file, and test whether its attribute values satisfy the selection condition Binary search If the selection condition involves an equality comparison on a key attribute on which the file is ordered, binary search can be used Use primary/clustering/secondary index Use the index to find the record(s) satisfying the corresponding selection condition

22 Algorithms for SELECT Operation
Algorithms for Complex Selection:  DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE) Linear search (brute-force; full tablescan) Use an individual index E.g.: use the Dno index to retrieve the records and then check whether each retrieved record satisfies the remaining conditions Use a composite index E.g.: use the (Dno, Sex) composite index to retrieve the records and check whether they satisfy the remaining condition Intersection of record pointers: This method is possible if secondary indexes are available on all (or some of) the fields involved E.g.: intersect the record pointers returned by the indexes for Dno, Salary, and Sex

23 Algorithms for JOIN Operations
R A=B S The cost of joining two relations makes the choice of a join algorithm crucial Examples EMPLOYEE DNO=DNUMBER DEPARTMENT DEPARTMENT MGRSSN=SSN EMPLOYEE

24 Computing Joins S R R A=B S
Suppose bR and bS are the number of blocks in R and S, rR and rS are the number of tuples in r and s rS rows bS blocks S rR rows bR blocks R

25 Algorithms for JOIN Operations
J1 Nested-loop join (brute force) R R.A=S.B S foreach tuple t  R do foreach tuple t’  S do if t.A = t’.B then output (t, t’) Cost = bR + rR  bS + cost of writing the final result Very expensive Order of the loop matters

26 Algorithms for JOIN Operations
J1(b) Nested-block join Instead of joining 1 tuple at a time, join one block at a time Cost = bR + bR/(nB – 2)  bS + cost of writing the result nB is the number of buffers available Number of blocks (bR) << Number of records (rR) R R1 R2 Memory buffer R1 R2 Output Buffer Join R1 & R2 with blocks in S S2 S3 S4 S1 S S1 S2 S3 S4

27 Algorithms for JOIN Operations
J2 Single-loop join (Using an access structure) R A=B S foreach tuple t in R do { use index to find all tuples t’ in S satisfying t.A = t’.B; output (t.t’) } Cost = bR + rR  cost of search + cost of writing output

28 Algorithms for JOIN Operations
J3 Sort-merge join R R,A=S.B S sort R on attribute A; sort S on attribute B; while !eof(R) and !eof(S) do { Scan R and S concurrently until t.A = t’.B = c; Output A=c(R)  B=c (S) } A=c(R) R S B=c (S)

29 Sort-Merge Join R S R A=B S D A B E 1 3 p p 0 9 q q 8 7 3 s s s 5 7
1 3 p p 0 9 q q s s s 5 7 u u 1 1 v v p p p p s s s u u u u u u B E p p 4 0 r 9 s 7 t t 2 5 u u u x R A=B S S

30 Algorithms for JOIN Operations
J4 Hash-join: Use the same hashing function on the join attributes A of R and B of S as hash keys Hash the file with fewer records (say, R) to the hash file buckets. Hash the other file (S) to the appropriate bucket, where the record is combined with all matching records from R.

31 Example EMPLOYEE DNO=DNUMBER DEPARTMENT Meta-data: DEPARTMENT:
Number of records, rD = 50 Number of disk blocks to store records, bD = 10 EMPLOYEE Number of records, rE = 6000 Number of disk blocks to store records, bE = 2000 Number of buffers available in memory, nB = 7 Size of each buffer is the same as size of each block on disk

32 Example: Nested Loop Join
foreach tuple t  R do foreach tuple t’  S do if t.A = t’.B then output (t, t’) Simple nested loop join Cost = bR + rR bs + cost of output Nested-block join Suppose there are nB buffers Use 1 buffer for output, 1 buffer for relation S, nB – 2 for relation R Cost = bR + bR/(nB – 2)  bs + cost of output

33 Example: Block Nested Loop Join
EMPLOYEE DNO=DNUMBER DEPARTMENT If EMPLOYEE is outer loop Cost = 2000/5  10 + cost of output = 6000 If DEPARTMENT is outer loop Cost = 10/5  cost of output = 4010 More efficient to use DEPARTMENT as the outer loop Order of the tables in the nested loop matters!

34 MySQL Example

35 MySQL Example department table chosen for outer loop of nested block join employee table chosen for inner loop of nested block join

36 MySQL Example straight_join forces MySQL to process the join in the order given (employee for outer loop and department for inner loop)

37 Example: Single Loop Join
DEPARTMENT MGRSSN=SSN EMPLOYEE Suppose there are multi-level indexes on SSN (for EMPLOYEE) : number of index levels, XSSN = 4 MGR_SSN (for DEPARTMENT): number of index levels, XMGR_SSN = 2 foreach tuple t in R do use index to find all tuples t’ in S satisfying t.A = t’.B; Use EMPLOYEE as outer loop Max Cost = bE + rE  (xMGR_SSN + 1) + cost of output =  3 + cost of output = cost of output Use DEPARTMENT as outer loop Max Cost = bD + rD  (xSSN + 1) + cost of output =  5 + cost of output = cost of output

38 Scan all the rows in department (outer loop of single loop join)
MySQL Example Scan all the rows in department (outer loop of single loop join) For each mgrId value, use the primary key index in employee table to fetch its corresponding row

39 Explaining EXPLAIN in MySQL
type – determines how table is accessed (most frequent) “ALL” - full table scan “eq_ref” - reference by primary or unique key (1 row) “ref” - reference by non-unique key (multiple rows) possible_keys - indexes MySQL could use for this table key – index MySQL sellected to use “ref” - The column or constant this key is matched against “rows” - How many rows will be looked up in this table “extra” - Extra Information “Using Filesort” external sort is used “Using where” - where clause will be resolved

40 Algorithms for PROJECT Operations
Algorithm for PROJECT operations  <attribute list>(R) If <attribute list> has a key of relation R, extract all tuples from R with only the values for the attributes in <attribute list>. If <attribute list> does NOT include a key of relation R, duplicated tuples are removed from the results. Methods to remove duplicate tuples Sorting Hashing

41 Algorithms for SET Operations
UNION, INTERSECTION, SET DIFFERENCE and CARTESIAN PRODUCT CARTESIAN PRODUCT of relations R and S Includes all possible combinations of records from R and S. The attributes of the result include all attributes of R and S. CARTESIAN PRODUCT operation is very expensive and should be avoided if possible. If R has n records and j attributes and S has m records and k attributes, the result relation will have n*m records and j+k attributes.

42 Algorithms for SET Operations
UNION (See Figure 15.3c) Sort the two relations on the same attributes. Scan and merge both sorted files concurrently, whenever the same tuple exists in both relations, only one is kept in the merged results. INTERSECTION (See Figure 15.3d) Scan and merge both sorted files concurrently, keep in the merged results only those tuples that appear in both relations. SET DIFFERENCE R-S (See Figure 15.3e) Scan and merge both sorted files concurrently, keep in the merged results only those tuples that appear in relation R but not in relation S.

43 Implementing Aggregate Operations
Aggregate operators: MIN, MAX, SUM, COUNT and AVG Options to implement aggregate operators: Table Scan Use Index Example: SELECT MAX (SALARY) FROM EMPLOYEE; If an (ascending) index on SALARY exists, then the optimizer could traverse the index for the largest value, which would entail following the right most pointer in each index node from the root to a leaf.


Download ppt "Database Administration"

Similar presentations


Ads by Google