Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Union, Intersection, Difference (subquery) UNION (subquery) produces the union of the two relations. Similarly for INTERSECT, EXCEPT = intersection and.
SQL CSET 3300.
Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
CS411 Database Systems Kazuhiro Minami 06: SQL. Join Expressions.
1 Lecture 12: Further relational algebra, further SQL
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
1 Introduction to SQL Multirelation Queries Subqueries Slides are reused by the approval of Jeffrey Ullman’s.
Relational Operations on Bags Extended Operators of Relational Algebra.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
Relational Operations on Bags Extended Operators of Relational Algebra.
Joins Natural join is obtained by: R NATURAL JOIN S; Example SELECT * FROM MovieStar NATURAL JOIN MovieExec; Theta join is obtained by: R JOIN S ON Example.
Winter 2002Arthur Keller – CS 1807–1 Schedule Today: Jan. 24 (TH) u Subqueries, Grouping and Aggregation. u Read Sections Project Part 2 due.
Logical Rules Recursion
1 More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
1 Datalog Logical Rules Recursion. 2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Chapter 6 Notes. 6.1 Simple Queries in SQL SQL is not usually used as a stand-alone language In practice there are hosting programs in a high-level language.
Relational Algebra Basic Operations Algebra of Bags.
Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
Constraints on Relations Foreign Keys Local and Global Constraints Triggers Following lecture slides are modified from Jeff Ullman’s slides
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
From Professor Ullman, Relational Algebra.
SCUHolliday - coen 1785–1 Schedule Today: u Relational Algebra. u Read Chapter 5 to page 199. Next u SQL Queries. u Read Sections And then u Subqueries,
Extended Operators in SQL and Relational Algebra Zaki Malik September 11, 2008.
Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.
More Relation Operations 2015, Fall Pusan National University Ki-Joune Li.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
1 Introduction to SQL Database Systems. 2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation.
Himanshu GuptaCSE 532-SQL-1 SQL. Himanshu GuptaCSE 532-SQL-2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
SCUHolliday - coen 1787–1 Schedule Today: u Subqueries, Grouping and Aggregation. u Read Sections Next u Modifications, Schemas, Views. u Read.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
1 Introduction to Database Systems, CS420 Relational Algebra.
1 Database Design: DBS CB, 2 nd Edition Relational Algebra: Basic Operations & Algebra of Bags Ch. 5.
1. Chapter 2: The relational Database Modeling Section 2.4: An algebraic Query Language Chapter 5: Algebraic and logical Query Languages Section 5.1:
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
1 Database Design: DBS CB, 2 nd Edition SQL: Select-From-Where Statements & Multi-relation Queries & Subqueries Ch. 6.
Select-From-Where Statements Multirelation Queries Subqueries
Basic Operations Algebra of Bags
Slides are reused by the approval of Jeffrey Ullman’s
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Databases : More about SQL
CPSC-310 Database Systems
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
CS 440 Database Management Systems
CPSC-608 Database Systems
Database Design and Programming
IST 210: Organization of Data
Operators Expression Trees Bag Model of Data
CPSC-310 Database Systems
More Relation Operations
Basic Operations Algebra of Bags
Algebraic and Logical Query Languages pp.54 is added
5.1 Relational Operations on Bags
CPSC-608 Database Systems
CPSC-608 Database Systems
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Presentation transcript:

Chapter 5 Notes

P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements, and only one occurrence of each element. A bag allows more than one occurrence of an element, but the elements and their occurrences are unordered. A list allows more than one occurrence of an element, but the occurrences are ordered. Thus, {1, 2, 1} and {2, 1, 1} are the same bag, but (1, 2, 1) and (2, 1, 1) are not the same list.

P. 206: Why bags? Commercial DBMS’s implement relations that are bags, rather than sets ….. Some relational operations are considerably more efficient if we use the bag model – To take the union of two relations as bags, we simply copy one relation and add to the copy all the tuples of the other relation There is no need to eliminate duplicate copies of a tuple that happens to be in both relations. – When we project relations as sets, we need to compare each projected type with all the other projected tuples, to make sure that each projection appears only once. If we can accept a bag as a result, then we simply project each tuple and add it to the result; no comparison with other projected tuples is necessary.

Union, Intersection, and Difference of Bags Suppose that R and S are bags, and that tuple t appears n times in R and m times in S. In the bag union R  S, tuple t appears n + m times In the bag intersection R  S, the tuple t appears min(n, m) times In the bag difference R – S, tuple t appears max(0, n- m) times

Projection and Selection on Bags Each tuple is processed independently during a projection – If the elimination of one or more attributes during the projection causes the same tuple to be created from several tuples, these duplicate tuples are not eliminated from the result of the a bag-projection. To apply a selection to a bag, we apply the selection condition to each tuple independently. – Duplicate tuples are not eliminated in the result

Product and Joins of Bags The rule for the Cartesian product of bags is the expected one. Joining bags presents no surprises – Compare each tuple of one relation with each tuple of the other, – decide whether or not this pair of tuples joins successfully, – and if so we put the resulting tuple in the answer. – Duplicate tuples are not eliminated in the answer.

Section 5.2: The Extended Algebra δ = eliminate duplicates from bags. τ = sort tuples. γ = grouping and aggregation. Outerjoin : avoids “dangling tuples” = tuples that do not join with anything.

Duplicate Elimination R1 := δ (R2). R1 consists of one copy of each tuple that appears in R2 one or more times.

Example: Duplicate Elimination R = ( AB ) δ (R) = AB 12 34

5.2.2: Aggregation Operators Aggregation operators are not operators of relational algebra. Rather, they apply to entire columns of a table and produce a single result. The most important examples: SUM, AVG, COUNT, MIN, and MAX.

Example: Aggregation R = ( AB ) SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 AVG(B) = 3

5.2.4: Grouping Operator R1 := γ L (R2). L is a list of elements that are either: 1.Individual (grouping ) attributes. 2.AGG(A ), where AGG is one of the aggregation operators and A is an attribute. An arrow and a new attribute name renames the component.

Applying γ L (R) Group R according to all the grouping attributes on list L. – That is: form one group for each distinct list of values for those attributes in R. Within each group, compute AGG(A ) for each aggregation on list L. Result has one tuple for each group: 1.The grouping attributes and 2. Their group’s aggregations.

14 Grouping/Aggregation R = ( ABC ) γ A,B,AVG(C)->X (R) = ?? First, group R by A and B : ABC Then, average C within groups: ABX

5.2.6: Sorting R1 := τ L (R2). – L is a list of some of the attributes of R2. R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on. – Break ties arbitrarily. τ is the only operator whose result is neither a set nor a bag.

Example: Sorting R = ( AB ) τ B (R) = [(5,2), (1,2), (3,4)]

5.2.7: Outerjoin Suppose we join R ⋈ C S. A tuple of R that has no tuple of S with which it joins is said to be dangling. – Similarly for a tuple of S. Outerjoin preserves dangling tuples by padding them NULL.

Outerjoin R = ( AB )S = ( BC ) (1,2) joins with (2,3), but the other two tuples are dangling. R OUTERJOIN S =ABC NULL NULL67

5.3 A Logic for Relations The logical query language Datalog consists of if-then rules. Each of these rules expresses the idea that from certain combinations of tuples in certain relations, – we may infer that some other tuple must be in some other relation, or in the answer to a query. If-then logical rules have been used in many systems. – Nonrecursive rules are equivalent to the core relational algebra. – Recursive rules extend relational algebra and appear in SQL-99.

Integration example Goal: integrated view of the menus at many bars Sells(bar, beer, price). Joe has data JoeMenu(beer, price). Approach 1: Describe Sells in terms of JoeMenu and other local data sources. Sells(’Joe’’s Bar’, b, p) <- JoeMenu(b, p)

Integration Example cont. Approach 2: Describe how JoeMenu can be used as a view to help answer queries about Sells and other relations. JoeMenu(b, p) <- Sells(’Joe’’s Bar’, b, p)

A Logical rule Our first example of a rule uses the relations Frequents(drinker, bar), Likes(drinker, beer), and Sells(bar, beer, price). The rule is a query asking for “happy” drinkers --- those that frequent a bar that serves a beer that they like.

Anatomy of a Rule Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Body = antecedent = AND of subgoals. Head = consequent, a single subgoal Read this symbol “if”

Subgoals Are Atoms An atom is a predicate, or relation name with variables or constants as arguments. In essence, a predicate is the name of a function that returns a boolean value. – R(a, b, c) is true if (a, b, c) is a tuple of R. The head is an atom; the body is the AND of one or more atoms. Convention: Predicates begin with a capital, variables begin with lower-case.

Atom Sells(bar, beer, p) The predicate = name of a relation Arguments are variables (or constants).

Interpreting Rules A variable appearing in the head is distinguished ; otherwise it is nondistinguished. Rule meaning: The head is true for given values of the distinguished variables if there exist values of the nondistinguished variables that make all subgoals of the body true.

Interpretation Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Distinguished variable Nondistinguished variables Interpretation: drinker d is happy if there exist a bar, a beer, and a price p such that d frequents the bar, likes the beer, and the bar sells the beer at price p.

Arithmetic Atoms or Subgoals In addition to relations as predicates, a predicate for a subgoal of the body (an atom) can be an arithmetic comparison. We write arithmetic atoms in the usual way, – e.g., x < y. The previously defined atoms are called relational atoms

Arithmetic Atoms A beer is “cheap” if there are at least two bars that sell it for under $2. Cheap(beer) <- Sells(bar1,beer,p1) AND Sells(bar2,beer,p2) AND p1 < 2.00 AND p2 bar2

Negated Subgoals NOT in front of a subgoal negates its meaning. Example: Think of Arc(a,b) as arcs in a graph. – S(x,y) says the graph is not transitive from x to y ; i.e., there is a path of length 2 from x to y, but no arc from x to y. S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y)

Datalog Rules and Queries Applying a rule consider all combinations of values of the variables. If all subgoals are true, then evaluate the head. The resulting head is a tuple in the result. Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) FOR (each d, bar, beer, p) IF (Frequents(d,bar), Likes(d,beer), and Sells(bar,beer,p) are all true) add Happy(d) to the result Note: set semantics so add only once

Applying a Rule 2 For each subgoal, consider all tuples that make the subgoal true. If a selection of tuples define a single value for each variable, then add the head to the result. Leads to finite search for P(x)<-Q(x), but P(x)<-Q(y) is problematic. – We want rule evaluations to be finite and lead to finite results. – “Unsafe” rules like P(x)<-Q(y) have infinite results, even if Q is finite.

Rule Evaluation 2 Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) FOR (each f in Frequents, i in Likes, and s in Sells) IF (f[1]=i[1] and f[2]=s[1] and i[2]=s[2]) add Happy(f[1]) to the result

Safe Rules A rule is safe if: 1.Each distinguished variable, 2.Each variable in an arithmetic subgoal, and 3.Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. Safe rules prevent infinite results.

Unsafe Rules Each of the following is unsafe and not allowed: 1.S(x) <- R(y) 2.S(x) <- R(y) AND NOT R(x) 3.S(x) <- R(y) AND x < y In each case, an infinity of x ’s can satisfy the rule, even if R is a finite relation.

Safe Rules Advantage We can use “approach 2” to evaluation, where we select tuples from only the nonnegated, relational subgoals. The head, negated relational subgoals, and arithmetic subgoals thus have all their variables defined and can be evaluated.

5.4 Datalog Programs Datalog program = collection of rules. In a program, predicates can be either 1.EDB = Extensional Database = stored table. 2.IDB = Intensional Database = relation defined by rules. Never both! No EDB in heads

Evaluating Datalog Programs As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. If an IDB predicate has more than one rule, each rule contributes tuples to its relation.

Example Datalog Program Using EDB Sells(bar, beer, price) and Beers(name, manf), find the manufacturers of beers Joe doesn’t sell. JoeSells(b) <- Sells(’Joe’’s Bar’, b, p) Answer(m) <- Beers(b,m) AND NOT JoeSells(b) Step 1: Examine all Sells tuples with first component ’Joe’’s Bar’. – Add the second component to JoeSells. Step 2: Examine all Beers tuples (b,m). – If b is not in JoeSells, add m to Answer.

Relational Algebra and Datalog Without recursion, Datalog can express all and only the queries of core relational algebra. – The same as SQL select-from-where, without aggregation and grouping. But with recursion, Datalog can express more than these languages. Yet still not Turing-complete.