Presentation on theme: "D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database."— Presentation transcript:
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA
22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful query languages: Strong formal foundation based on logic. High level, abstract formulation of queries. Easy to program. Allows the DBS to do much optimization. DBS can choose, e.g., most efficient sorting algorithm or the order of basic operations.
33 R ELATIONAL Q UERY L ANGUAGES Query Languages != programming languages! QLs not expected to be “Turing complete”. Able to answer any computable problem given enough storage space and time QLs not intended to be used for complex calculations. Applications on top of database can do that QLs support easy, efficient access to large data sets. E.g., in a QL cannot determine whether the number of tuples of a table is even or odd, create a visualization of the results of a query, ask the user for additional input.
44 F ORMAL Q UERY L ANGUAGES Two mathematical query languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra (RA) More procedural, very useful for representing execution plans, relatively close to SQL. Composed of a collection of operators A step-by-step procedure for computing the answer Relational Calculus (RC) Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.) Describes the answer, not the steps. Understanding these formal query languages is important for understanding SQL and query processing.
55 P RELIMINARIES A query is applied to relation instances, and the result of a query is also a relation instance. Inputs and Outputs of Queries are relations Query evaluated on instances of input relations Different instance (DB?) as input = different answer Schemas of input relations for a query are fixed (but query will run regardless of instance!) The schema for the result of a given query is also fixed! Determined by definition of input relations and query language constructs.
66 P RELIMINARIES Positional vs. named-attribute notation: Positional notation Ex: function(1,2,3) easier for formal definitions Named-attribute notation Ex: function(var1=1, var2=2, var3=3) more readable Advantages/disadvantages of one over the other?
77 E XAMPLE I NSTANCES “Sailors” and “Reserves” relations for our examples. 2 instances of “Sailors” S1 & S2 We’ll use positional or named attribute notation Assume that names of attributes in query results are `inherited’ from names of attributes in query input relations R1 S1 S2
R ELATIONAL A LGEBRA
99 An algebra consists of operators and operands. Operands can be either variables or constants. In the algebra of arithmetic: atomic operands are variables such as x or y and constants such as 15. Operators are the usual arithmetic operators such as +, -, *. Expressions are formed by applying operators to atomic operands or other expressions. For example, 15 x + 15 (x + 15) * y
10 R ELATIONAL A LGEBRA Algebraic expressions can be re-ordered according to commutativity or associativity laws without changing their resulting value. E.g., 15 + 20 = 20 + 15 (x * y) * z = x * (y * z) Parentheses group operators and define precedence of operators, e.g. (x + 15) * y x + (15 *y) Describes step-by-step procedure for computing the query Just like algebra in math.
11 R ELATIONAL A LGEBRA In relational algebra, operands are relations / tables, and an expression evaluates to a relation / set of tuples. The relational algebra operators are set operations, operations removing rows (selection) or columns (projection) from a relation, operations combining two relations into a new one (Cartesian product, join), a renaming operation, which changes the name of the relation or of its attributes.
12 R ELATIONAL A LGEBRA Every operator takes one or two relation instances A relational algebra expression is recursively defined to be a relation A combination of relations is a relation Result is also a relation Can apply operator to Relation from database Relation as a result of another operator
13 R ELATIONAL A LGEBRA O PERATIONS Basic operations Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cartesian product ( ) Combine two relations. Set-difference ( ) Tuples in relation 1, but not in relation 2. Union ( ) Tuples in relation 1 or in relation 2.
14 R ELATIONAL A LGEBRA O PERATIONS Additional operations: Intersection, join, division. Renaming of relations / attributes. Not essential, can be implemented using the five basic operations. But (very!) useful. Since each operation returns a relation, operations can be composed, i.e. output of one operation can be input of the next operation.
15 Renames relations / attributes, without changing the relation instance. relation R is renamed to S, attributes are renamed A1,..., An Rename only some attributes using the positional notation to reference attributes No renaming of attributes, just the relation R ENAMING
16 P ROJECTION One input relation. Deletes attributes that are not in projection list. Schema of result contains exactly the attributes in the projection list, with the same names that they had in the (only) input relation. Projection operation has to eliminate duplicates, since relations are sets. Duplicate elimination is expensive. Commercial DBMS typically don’t do duplicate elimination unless the user explicitly asks for it.
17 P ROJECTION Similar in concept to VIEWs S2 Other fields are projected out
18 S ELECTION One input relation. Selects all tuples that satisfy selection condition. No duplicates in result! Schema of result identical to schema of (only) input relation. Selection conditions: simple conditions comparing attribute values (variables) and / or constants or complex conditions that combine simple conditions using logical connectives AND and OR.
19 S ELECTION S2
20 S ELECTION S2 But the result of an expression is another relation, we can substitute an expression wherever a relation is expected
21 U NION, I NTERSECTION, S ET -D IFFERENCE All of these set operations take two input relations, which must be union- compatible : Same sets of attributes. Corresponding attributes have same type. What is the schema of result?
22 U NION Concatenates S1 and S2 Result contains ALL tuples that occur in either S1 or S2 Schemas must be identical If they have the same number of fields Fields have same domains S1 S2 SELECT * FROM S1 UNION SELECT * FROM S2 SELECT * FROM S1 UNION SELECT * FROM S2
23 I NTERSECTION Result contains ALL tuples that occur in both S1 or S2 Schemas must be identical S1 S2 SELECT * FROM S1 INTERSECT SELECT * FROM S2 SELECT * FROM S1 INTERSECT SELECT * FROM S2
24 S ET -D IFFERENCE Result contains ALL tuples that occur in S1 but not in S2 Schemas must be identical S1 S2 SELECT * FROM S1 MINUS SELECT * FROM S2 SELECT * FROM S1 MINUS SELECT * FROM S2
25 C ARTESIAN P RODUCT Also referred to as cross-product or product. Two input relations. Each tuple of the one relation is paired with each tuple of the other relation. Result schema has one attribute per attribute of both input relations, with attribute names `inherited’ if possible. In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid. What to do??
26 C ARTESIAN P RODUCT Result contains all fields of S1 followed by fields of S2 Contains one tuple for each pair of tuples in S1 and S2. Schemas can be different If S1 and S2 contains fields of the same name, there is a naming conflict. Rename one or both fields. S1 S2 SELECT * FROM S1, S2
27 C ARTESIAN P RODUCT Field names in conflict become unnamed R1 S1
28 R ENAMING In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid. Then, apply the renaming operation, e.g. The field names in the output relation are the same fields, but some are renamed Field 1 is renamed to sid1, field 5 is renamed to sid5 For an expression to be well defined, no 2+ fields must have same name
29 J OIN Similar to Cartesian product with same result schema. Cartesian Product has no conditions Join has conditions Join can be defined as a Cartesian Product, followed by selections and projections Join however is much more efficient Why? R1 S1
30 J OIN Condition Join Each tuple of the one relation is paired with each tuple of the other relation if the two tuples satisfy the join condition. Condition c refers to attributes of both R and S.
31 J OIN Equi-Join : A special case of Conditional Join where the condition c contains only equalities. Result schema similar to Cartesian product, but only one copy of attributes for which equality is specified. R1.sid is redundant R1.sid is dropped by a applying a projection Natural Join : Equi-join on all common attributes.
32 D IVISION Not supported as a primitive operation, but useful for expressing queries like: Find sailors who have reserved all boats. Let A have 2 attributes, x and y ; B have only attribute y : A/B = i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple (reservation) in A. In general, x and y can be any lists of attributes; y is the list of attributes in B, and x y is the list of attributes of A.
33 D IVISION A B1 B2 B3 A/B1A/B2A/B3
34 D IVISION Division is not an essential operation; can be implemented using the five basic operations. Also true of joins, but joins are so common that systems implement joins specially. Idea : For A/B, compute all x values in A that are not ‘disqualified’ by some y value in B. x value in A is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A. Disqualified x values: A/B: all disqualified x values
35 D IVISION Division is not an essential operation; can be implemented using the five basic operations. SELECT name FROM Student WHERE NOT EXISTS ( SELECT CID FROM Course WHERE CID NOT IN ( SELECT CID FROM Register WHERE Student.SID = Register.SID)) returns those courses this student didn’t take If such courses don’t exist (NOT EXISTS), then student must’ve taken all courses
36 Find names of sailors who’ve reserved boat #103. Solution 1: Solution 2: Solution 3: Which is most efficient? Why? E XAMPLE Q UERIES
37 Find names of sailors who’ve reserved a red boat. Information about boat color only available in Boats; so need an extra join: A more efficient solution: A query optimizer can find the second solution given the first one. E XAMPLE Q UERIES
38 Find sailors who’ve reserved a red or a green boat. Can identify all red or green boats, then find sailors who’ve reserved one of these boats: Can also define Tempboats using union! (How?) What happens if OR is replaced by AND in this query? E XAMPLE Q UERIES
39 Find sailors who’ve reserved a red and a green boat. Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors): E XAMPLE Q UERIES
40 Find the names of sailors who’ve reserved all boats. Uses division; schemas of the input relations must be carefully chosen: To find sailors who’ve reserved all ‘Interlake’ boats: Book has lots of examples. E XAMPLE Q UERIES
41 S UMMARY The relational model has formal query languages that are easy to use and allow efficient optimization by the DBS. Relational algebra (RA) is more procedural; used as internal representation for SQL query evaluation plans. Five basic RA operations: selection, projection, Cartesian product, union, set-difference. Additional operations as shorthand for important cases: intersection, join, division. These operations can be implemented using the basic operations.