1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different.

1 Optimization Recap and examples

2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different alternatives could result in huge run-time differences. Our aim is to introduce the basic hardware used, and optimization principles

3 Disk-Memory-CPU Delete from Sailors where sid=90 DISK sailors Reserves Main Memory CPU

4 Hardware Recap The DB is kept on the Disk. The Disk is divided into BLOCKS Any processing of the information occurs in the Main Memory. Therefore, a block which we want to access has to be brought from the Disk to the memory, and perhaps written back. Blocks are read/written from/to the Disk as single units. The time of reading/writing a block to/from the disk is an I/O operation, and takes a lot of time.

5 Hardware Recap We assume a constant time for each Disk access, and that only disk access define the run time. We do not consider writing to the disk Every table in the DB is stored as a File (on the Disk), which is a ‘bunch of Blocks’. We will deal with files that are ‘heap-sorted’, i.e., there is no order in the file tuples Every block contains many tuples, each of them has a Record ID (RID), which states its location: (number of block, number of tuple within the block)

6 SID SNAME ratingage 1923Joe832 3321Phil941 1332Boe733 1226Bill623 1444Paul121 1112Jim333 1445Vicky954 RID Block 1 Block 2 Block 3 (1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1)

7 SID SNAME ratingage 1923Joe832 3321Phil941 1332Boe733 1226Bill623 1444Paul121 1112Jim333 1445Vicky954 B blocks t tuples Q: What would be the cost of the following queries? Select * from sailors Select * from sailors where sname= ‘ Jim ’ Select * from sailors where rating>4 Answer: B

8 Indexes on files An Index on a table is an additional file which helps access the data fast. An index holds ‘data entries’ to the table file The index can have the structure of a B+ Tree, or a hash function.

9 Tree index on sname of sailors ‘ A ’ -> ’ M ’ B1 ‘ N ’ -> ’ Z ’ B2 ‘ N ’ -> ’ T ’ L3 ‘ U ’ -> ’ Z ’ L4 ‘ A ’ -> ’ G ’ L1 ‘ H ’ -> ’ M ’ L2 Root block Leaf blocks Branch blocks ‘ Bill ’ (2,1) ‘ Boe ’ (1,3) ‘ Vicky ’ (3,1) … ‘ Paul ’ (2,2) ‘ Phil ’ (1,2) ‘ Jim ’ (2,3) ‘ Joe ’ (1,1) B1 L4L3 L2L1 B2 SIDSNAMEratingage 1923Joe832 3321Phil941 1332Boe733 1226Bill623 1444Paul121 1112Jim333 1445Vicky954

10 Tree index The tree is kept balanced The tree entries are always ordered The leaves point to the exact location of tuples Getting to the leaf is typically 2-3 I/O Each leaf points to the next/previous leaf A Clustered index means that the index and the table are ordered by the same attribute

11 Tree index on sname of sailors ‘ Bill ’ (2,1) ‘ Boe ’ (1,3) ‘ Phil ’ (1,2) … ‘ Joe ’ (2,2) ‘ Joe ’ (3,1) ‘ Jim ’ (2,3) ‘ Joe ’ (1,1) B1 L4L3 L2L1 B2 SIDSNAMEratingage 1923Joe832 3321Phil941 1332Boe733 1226Bill623 1444Joe121 1112Jim333 1445Joe954 How would the following queries be processed? Select * from sailors where sname= ‘ Joe ’ Select * from sailors Select * from sailors where sname> ’ J ’ Notice: index is not clustered

12 Tree index on sname of sailors ‘ Bill ’ (1,1) ‘ Boe ’ (1,2) ‘ Phil ’ (3,1) … ‘ Joe ’ (2,2) ‘ Joe ’ (2,3) ‘ Jim ’ (1,1) ‘ Joe ’ (2,1) B1 L4L3 L2L1 B2 SIDSNAMEratingage 1226Bill623 1332Boe733 1112Jim333 1923Joe832 1444Joe121 1445Joe954 3321Phil941 How would the following queries be processed? Select * from sailors where sname= ‘ Joe ’ Select * from sailors Select * from sailors where sname> ’ J ’ Notice: index is clustered

13 Hash index Works in a similar way, but using a hash function instead of a tree Works only for equality conditions Average of 1.2 I/O to get to the tuple location

14 Natural Join We want to compute Naïve algorithm: SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid Foreach tuple r in R Foreach tuple s in S if r.sid=s.sid add r,s to result Cost: B R +t R *B S Running example data t R =5000 t S =10,000 50 tuples per block 12 buffer pages = 100+5000*200=1,000,100

15 Natural Join We want to compute We have 4 optional algorithms: 1.Block Nested Loops Join 2.Index Nested Loops Join 3.Sort Merge Join 4.Hash Join SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid This is assuming there is not enough space in the memory for the smaller of the 2 relations+2

16 Block Nested Loop Join Suppose there are B available blocks in the memory, B R blocks of relation R, and B S blocks of relations S, and B R <B S Until all blocks of R have been read: –Read B-2 blocks of R –Read all blocks of S (one by one), and write the result Run time: B R + B S * ceil(B R /(B-2)) = 100+200*100/10=2,100

17 Index Nested Loop Suppose there is an index on sid of Sailors Until all blocks of R have been read: –Read a block of R –For each tuple in the block, use the index of S to locate the matching tuples in S. We mark the time it takes to read the tuples in S that match a single tuple in R as X. Run time: B R + t R *X If the index is clustered, X=2-4 If it is not clustered, we evaluate X. = 100+5000*3=15,100

18 Q: So when would we typically choose to use an index-nested loop over block- nested? A: Look at the inequality…

19 Sort-Merge Join Sort both relations on the join column Join them according to the join algorithm: sidbiddayagent 2810312/4/96Joe 2810311/3/96Frank 3110110/2/96Joe 3110212/7/96Sam 3110113/7/96Sam 581032/6/96Frank sidsnameratingage 22dustin745 28yuppy935 31lubber855 36lubber636 44guppy535 58rusty1035

20 Run time of Sort-Merge M,N: number of blocks of the relations Sorting: MlogM+NlogN Merging: N+M if no partition is scanned twice. Total: MlogM+NlogN+N+M Especially good if one or both of the relations are already sorted. = 100*7+200*8+100+200=2,600

21Question Suppose: tuple size= 100 bytes number of tuples (employees)=3,000 Page size=1000 bytes You have an unclustered index on Hobby. You know that 50 employees collect stamps. Would you use the index? And for 1,000 stamp-lovers? SELECT E.dno FROM Employees E WHERE E.hobby=‘stamps’

22 Question 2 Length of tuples, Number of tuples –Emp: 20 bytes, 20,000 tuples –Dept: 40 bytes, 5000 tuples Pages contain 4000 bytes; 12 buffer pages Which algorithm would you use if there is an unclustered tree index on E.eid? And clustered? SELECT E.ename FROM Employees E, Departments D WHERE E.eid=D.eid

1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different.

Similar presentations

Presentation on theme: "1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different.

Similar presentations

Presentation on theme: "1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different."— Presentation transcript:

Similar presentations

About project

Feedback