# Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014.

## Presentation on theme: "Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014."— Presentation transcript:

Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Why Do We Learn This? Querying the database: specify what we want from our database – Find all the people who earn more than \$1,000,000 and pay taxes in Tallahassee Could write in C++/Java, but a bad idea Instead use high-level query languages: – Theoretical: Relational Algebra, Datalog – Practical: SQL – Relational algebra: a basic set of operations on relations that provide the basic principles 1

What is an “Algebra”? Mathematical system consisting of: – Operands --- variables or values from which new values can be constructed – Operators --- symbols denoting procedures that construct new values from given values Examples – Arithmetic(Elementary) algebra, linear algebra, Boolean algebra …… What are operands? What are operators? 2

What is Relational Algebra? An algebra – Whose operands are relations or variables that represent relations – Whose operators are designed to do common things that we need to do with relations in a database relations as input, new relation as output – Can be used as a query language for relations 3

Relational Operators at a Glance Five basic RA operations: – Basic Set Operations union, difference (no intersection, no complement) – Selection:  – Projection:  – Cartesian Product: X When our relations have attribute names: – Renaming:  Derived operations: – Intersection, complement – Joins (natural join, equi-join, theta join, semi-join, ……) 4

Set Operations Union: all tuples in R1 or R2, denoted as R1 U R2 – R1, R2 must have the same schema – R1 U R2 has the same schema as R1, R2 – Example: Active-Employees U Retired-Employees – If any, is duplicate elimination required? Difference: all tuples in R1 and not in R2, denoted as R1 – R2 – R1, R2 must have the same schema – R1 - R2 has the same schema as R1, R2 – Example All-Employees - Retired-Employees 5

Selection Returns all tuples which satisfy a condition, denoted as  c (R) – c is a condition: =,, AND, OR, NOT – Output schema: same as input schema – Find all employees with salary more than \$40,000:  Salary > 40000 (Employee) 6 SSNNameDept-IDSalary 111060000Alex130K 754320032Bob132K 983210129Chris245K SSNNameDept-IDSalary 983210129Chris245K

Projection Unary operation: returns certain columns, denoted as  A1,…,An (R) – Eliminates duplicate tuples ! – Input schema R(B1, …, Bm) – Condition: {A1, …, An} {B1, …, Bm} – Output schema S(A1, …, An) Example: project social-security number and names: –  SSN, Name (Employee) 7 SSNNameDept-IDSalary 111060000Alex130K 754320032Bob132K 983210129Chris245K SSNName 111060000Alex 754320032Bob 983210129Chris

Selection vs. Projection Think of relation as a table – How are they similar? – How are they different? Horizontal vs. vertical? Duplicate elimination for both? What about in real systems? – Why do you need both? 8

Cartesian Product Each tuple in R1 with each tuple in R2, denoted as R1 x R2 – Input schemas R1(A1,…,An), R2(B1,…,Bm) – Output schema is S(A1, …, An, B1, …, Bm) Two relations are combined! – Very rare in practice; but joins are very common – Example: Employee x Dependent 9

Example 10 SSNName 111060000Alex 754320032Brandy Employee-SSNDependent-Name 111060000Chris 754320032David Employee Dependent SSNNameEmployee-SSNDependent-Name 111060000Alex111060000Chris 111060000Alex754320032David 754320032Brandy111060000Chris 754320032Brandy754320032David Employee x Dependent

Renaming Does not change the relational instance, denoted as Notation:  S(B1,…,Bn) (R) Changes the relational schema only – Input schema: R(A1, …, An) – Output schema: S(B1, …, Bn) Example:  Soc-sec-num, firstname (Employee) 11 SSNName 111060000Alex 754320032Bob 983210129Chris Soc-sec-numfirstname 111060000Alex 754320032Bob 983210129Chris

Set Operations: Intersection Intersection: all tuples both in R1 and in R2, denoted as R1 R2 – R1, R2 must have the same schema – R1 R2 has the same schema as R1, R2 – Example UnionizedEmployees RetiredEmployees Intersection is derived: – R1 R2 = R1 – (R1 – R2) why ? 12

Theta Join A join that involves a predicate  denoted as R1  R2 – Input schemas: R1(A1,…,An), R2(B1,…,Bm) – Output schema: S(A1,…,An,B1,…,Bm) – Derived operator: R1  R2 =   (R1 x R2) 1.Take the product R1 x R2 2.Then apply SELECT C to the result – As for SELECT, C can be any Boolean-valued condition 13

Theta Join: Example 14 NameAddress AJ's1800 Tennessee Michael's Pub513 Gaines BarBeerPrice AJ’sBud2.5 AJ’sMiller2.75 Michael’s PubBud2.5 Michael’s PubCorona3.0 Bar Sells BarInfo := Sells Sells.Bar=Bar.Name Bar BarBeerPriceNameAddress AJ’sBud2.5 AJ's1800 Tennessee AJ’sMiller2.75 AJ's1800 Tennessee Michael’s PubBud2.5 Michael's Pub513 Gaines Michael’s PubCorona3.0 Michael's Pub513 Gaines

Natural Join Notation: R1 R2 Input Schema: R1(A1, …, An), R2(B1, …, Bm) Output Schema: S(C1,…,Cp) – Where {C1, …, Cp} = {A1, …, An} U{B1, …, Bm} Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: – {A1,…,An} {B1,…, Bm} (called the join attributes) 15

Natural Join: Examples 16 SSNName 111060000Alex 754320032Brandy SSNDependent-Name 111060000Chris 754320032David Employee Dependent SSNNameDependent-Name 111060000AlexChris 754320032BrandyDavid Employee Dependent =  SSN, Name, Dependent-Name (  Employee.SSN=Dependent.SSN (Employee x Dependent)

Natural Join: Examples 17 RS R S

Equi-join Special case of theta join: condition c contains only conjunction of equalities – Result schema is the same as that of Cartesian product – May have fewer tuples than Cartesian product – Most frequently used in practice: R1  R2 – Natural join is a particular case of equi-join – A lot of research on how to do it efficiently 18

A Joke About Join A join query walks up to two tables in a restaurant and asks : “Mind if I join you?” 19

Division A/B for A(x,y) and B(y) – Contains all tuples (x) such that for every y tuple in B, there is an xy tuple in A – Useful for expressing “for all” queries – For A/B, compute all x values that are not ‘disqualified’ by some y value in B x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A 20 1. Disqualified x values: 2. A/B: Disqualified x values

Division: Example 21 A B1 B2 B3 A/B1 A/B2 A/B3

Building Complex Expressions Algebras allow us to express sequences of operations in a natural way – Example In arithmetic algebra: (x + 4)*(y - 3) – Relational algebra allows the same Three notations, just as in arithmetic: 1.Sequences of assignment statements 2.Expressions with several operators 3.Expression trees 22

Sequences of Assignments Create temporary relation names Renaming can be implied by giving relations a list of attributes Example: R3 := R1 JOIN C R2 can be written: R4 := R1 x R2 R3 := SELECT C (R4) 23

Expressions with Several Operators Example: the theta-join R3 := R1 JOIN C R2 can be written: R3 := SELECT C (R1 x R2) Precedence of relational operators: 1.Unary operators --- select, project, rename --- have highest precedence, bind first 2.Then come products and joins 3.Then intersection 4.Finally, union and set difference bind last But you can always insert parentheses to force the order you desire 24

Expression Trees Leaves are operands – either variables standing for relations or particular constant relations Interior nodes are operators, applied to their child or children 25

Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than \$3 26 BarsSells SELECT addr = “Tennessee St.” SELECT price<3 AND beer=“Bud” PROJECT name RENAME R(name) PROJECT bar UNION

Summary of Relational Algebra Why bother ? Can write any RA expression directly in C++/Java, seems easy – Two reasons: Each operator admits sophisticated implementations (think of and  C ) Expressions in relational algebra can be rewritten: optimized  (age >= 30 AND age <= 35) (Employees) – Method 1: scan the file, test each employee – Method 2: use an index on age Employees Relatives – Iterate over Employees, then over Relatives? Or iterate over Relatives, then over Employees? – Sort Employees, Relatives, do “merge-join” – “hash-join” – etc. 27

Download ppt "Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014."

Similar presentations