2 Example - hospital database Doctors100 tuplesPatients2500 tuples
3 Query SELECT Doctors.name FROM Doctors, Patients Get the name of the doctors who treat patients sufferingfrom the prk11 diseaseSELECT Doctors.nameFROM Doctors, PatientsWHERE Disease = ‘prk11’ ANDD_name = Doctors.Name
4 Evaluation #1 restrict Patients to those who suffer from prk11 read: 2500 tuples; result: estimated 50 tuples; no need to write intermediate result - sufficiently smalljoin above result with Doctorsread: 100 tuples (Doctors); result 50 tuples; no need to write to disk intermediate resultproject result over Doctors.namethe desired result is in the memoryestimated cost (read and write) 2600
5 Evaluation #2 join Patients with Doctors restrict above result project suppose the internal memory allows only some 350 tuplesjoin Patients with Doctorsread Patients in batches of 250 tuples; therefore read Doctors 10 times; read: = 3500; write intermediate result (too big) to disk: 2500;restrict above resultread 2500; result: estimated 50 tuples;projectcost: 8500 (read and write)
6 Intermediate conclusions the evaluation strategy (procedural aspect) can lead to very big differences in computation time, for the same querycomputation time: read from and write to disk (quintessential)processor timethe actual evaluation procedures are far more complex than in the previous introductory example
7 Optimisation - whatdeciding upon the best strategy of evaluating a queryit is performed automatically by the optimiser of the DBMSnot just for data retrieval operations, but for updating operations as well (e.g. UPDATE)not guaranteed to give the best result
8 Optimisation - howbased on statistical information about the specific database (not necessarily, though) performexpression transformation (cast query in some internal form and convert to respective canonical formcandidate low level procedures selectionquery plans generation and selectionstatistical information - could you think of examples?cardinality of base relations, indexes, ...
9 Cast (transform) query in some internal form internal formatmore suitable for automatic processingtrees (syntax tree or query tree)from a conceptual point of view is is easier to assume that the internal format is relational algebra
10 Convert to canonical form the initial expression is transformed into an equivalent but more efficient form“efficient form” = efficient when executedthese transformation are performed independently from actual data values and access paths
11 Expression transformation examples(A WHERE condition#1) WHERE condition#2(A WHERE condition#1 AND condition#2)(A [projection#1] ) [projection#2]A [projection#2](A [projection]) WHERE condition(A WHERE condition) [restriction]
12 Expression transformation distributivitycommutativity and associativityidempotencescalar expressionsconditional expressionssemantic transformation
13 Set level operations the operators of relational algebra are set level i.e. they manipulate sets (relations) and not individual tupleshowever, these operators are implemented by internal (DBMS) proceduresthese procedures, inherently, need tuple-access (in fact, they need access to scalar values)
14 Choose candidate low-level procedures the optimiser decides how to execute the query (expressed in canonical form)access paths are relevant at this stagein the main, each basic operation (join , restriction, …) has a set of procedures that implement ite.g. RESTRICTION - (1) on candidate key; (2) on indexed key; (3) on other attributes …each procedure has associated a cost function (usually based on the required I/O disk operations); these functions are used in the next stage
15 Implementing JOIN - examples R and P - two relations to be joinedJ - the attribute on which the (natural) join is performedR[i] and P[j] mean the i-th tuple of R and the j-th tuple of P, respectivelyR[i].J means the value of the attribute J for the i-th tuple of the relation RR has M and P has N tuples, respectively
16 Implementing JOIN - brute force for i:=1 to Mfor j := 1 to N doif R[i].J = P[j].J thenadd joined tuple R[i]*P[j] to resultend
18 Implementing JOIN - index lookup /* index X on P.J */for i:=1 to Mfor j := 1 to K[i] doadd joined tuple (R[i] * PK[j]) to result/* PK[j] represents the tuple of Pthat K[j] points to */end
19 Choose the cheapest query plan construct query plans (query evaluation plan)combine candidate low level procedureschoose the cheapesttotal cost = the sum of individual costsindividual costs depend on the actual data values; estimates are used instead, based on statistical datausually not all possible evaluation procedures are generated; the search space is reduced by applying heuristics
20 Database statistics - in the data dictionary for each base tablecardinalityspace occupiedetc.for each column of each base tableno of distinct valuesmaximum, minimum and average valueshistogram of values…...
21 An optimiser is never perfect the following example is a real life examplesuppose a Postgres definition forbase relation: Treatment(Patient, Drug, Disease, …)the queryget all the drugs that are taken by patients that suffer from prk11(all the drugs, not only those for prk11)SELECT DISTINCT Drug FROM TreatmentWHERE Patient IN(SELECT Patient FROM TreatmentWHERE Disease = ‘prk11’) ;the query is far slower that the equivalent one (next) ...
22 An optimiser is never perfect /* this query is faster than the previous one,even though it seems to be performing morecomputations - Patient is not unique! */CREATE VIEW V_Treatment ASSELECT * FROM TreatmentSELECT DISTINCT Treatment.DrugFROM Treatment, V_TreatmentWHERE Treatment.Patient = V_Treatment.PatientAND Disease = ‘prk11’ ;