# Examples of Physical Query Plan Alternatives

## Presentation on theme: "Examples of Physical Query Plan Alternatives"— Presentation transcript:

Examples of Physical Query Plan Alternatives
Selections from Chapters 12, 14, 15 The slides for this text are organized into chapters. This lecture covers Chapter 12, providing an overview of query optimization and execution. This chapter is the first of a sequence (Chapters 12, 13, 14, 15) on query evaluation that might be covered in full in a course with a systems emphasis. It can also be used stand-alone, as a self-contained overview of these issues, in a course with an application emphasis. It covers the essential concepts in sufficient detail to support a discussion of physical database design and tuning in Chapter 20. 1

Query Optimization NOTE: Relational query languages provide a wide variety of ways in which a user can express. HENCE: system has many options for evaluating a query. Optimizer is important for query performance. Generates alternative plans Choose plan with least estimated cost. Ideally, find best plan. Realistically, consistently find a quite good one.

A Query (Evaluation) Plan
An extended relational algebra tree Annotations at each node indicate: access methods to use for each table. implementation methods used for each relational operator. Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Reserves Sailors sid=sid bid=100 rating > 5 sname

Query Optimization Multi-operator Queries: Pipelined Evaluation C B A
On-the-fly: The result of one operator is pipelined to another operator without creating a temporary table to hold intermediate result, called on-the-fly. Materialized : Otherwise, intermediate results must be materialized. C B A

Alternative Plans: Schema Examples
Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 3

Alternative Plans: Motivating Example
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 RA Tree: Reserves Sailors sid=sid bid=100 rating > 5 sname 4

RA Tree: R.bid=100 AND S.rating>5 Plan: SELECT S.sname
Reserves Sailors sid=sid bid=100 rating > 5 sname SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Costs : 1. Scan Sailors : For each page of Sailors, scan Reserves *1000 I/Os Or, 2. Scan Reserves For each page of Reserves, scan Sailors * 500 I/Os Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Plan: 4

Alternative Plans: Motivating Example
RA Tree: Reserves Sailors sid=sid bid=100 rating > 5 sname SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Cost: *1000 I/Os Almost the worst plan! Reasons : selections could be `pushed’ earlier, no use made of indexes Goal of optimization: To find more efficient plans that compute the same answer. Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Plan: 4

Alternative Plans 1 (No Indexes)
Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) temp T2) (Sort-Merge Join) Alternative Plans 1 (No Indexes) Main difference: push selects. Reduce size of table to be joined With 5 buffers, cost of plan: Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250) Total: page I/Os. Optimization1: block nested loops join: join cost = 10+4*250, total cost = 2770. Optimization2: `push’ projections: T1 has only sid, T2 only sid and sname: T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000. 5

Alternative Plan : Using Index ?
Push Selections Down ? What indices help here? Index on Reserves.bid? Index on Sailors.sid? Index on Sailors.rating? Reserves Sailors sid=sid bid=100 sname rating > 5 6

Example Plan : With Index
With index on Reserves.bid : Assume 100 different bid values. Assume 100,000 tuples. Assume 100 tuples/disk page We get 100,000/100 = tuples On 1000/100 = 10 disk pages. If index clustered, Cost = 10 I/Os. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) 6

Example Plan : Use Another Index
Index on Sailors.bid? Selection on bid reduces number of tuples considered in join. INL with pipelining : Outer is not materialized Projecting out unnecessary fields from outer doesn’t help. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loop Join, with pipelining ) 6

Example Plan Continued
Index on Sailors.sid : - Join column sid is key for Sailors. - At most one matching tuple, unclustered on sid OK. Cost? - For each Reserves tuples (1000): get matching Sailors tuple (1.2 I/O); so total 1210 I/Os. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) 6

Alternative Plan : With Second Index
Selection Pushing down? Push (rating>5) before join ? Answer: No, because of availability of sid index on Sailors. Reason : No index on selection result. Then selection requires scan Sailors. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) 6

Summary A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree. There are several alternative evaluation algorithms for each relational operator. Query evaluation must compare alternative plans based on their estimated costs Must understand query optimization in order to fully understand the performance impact of a given database design on a query workload 19

Similar presentations