Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slides adapted from Rao (ASU) & Franklin (Berkeley) Structure How will search and querying on these three types of data differ? A generic web page containing.

Similar presentations


Presentation on theme: "Slides adapted from Rao (ASU) & Franklin (Berkeley) Structure How will search and querying on these three types of data differ? A generic web page containing."— Presentation transcript:

1 Slides adapted from Rao (ASU) & Franklin (Berkeley) Structure How will search and querying on these three types of data differ? A generic web page containing text A movie review [English] [SQL] [XML] Semi-Structured An employee record

2 Slides adapted from Rao (ASU) & Franklin (Berkeley) Structure helps querying Expressive queries –Give me all pages that have key words “Get Rich Quick” –Give me the social security numbers of all the employees who have stayed with the company for more than 5 years, and whose yearly salaries are three standard deviations away from the average salary –Give me all mails from people from ASU written this year, which are relevant to “get rich quick” Efficient searching –equality vs. “similarity” –range-limited search

3 Slides adapted from Rao (ASU) & Franklin (Berkeley) Functionality of a DBMS Data Dictionary Management Storage management –Data storage Definition Language (DDL) High level query and data manipulation language –SQL/XQuery etc. –May tell us what we are missing in text-based search Efficient query processing –May change in the internet scenario Transaction processing Resiliency: recovery from crashes, Different views of the data, security –May be useful to model a collection of databases together Interface with programming languages

4 Slides adapted from Rao (ASU) & Franklin (Berkeley) Building an Application with a Database System Requirements modeling (conceptual, pictures) –Decide what entities should be part of the application and how they should be linked. Schema design and implementation –Decide on a set of tables, attributes. –Define the tables in the database system. –Populate database (insert tuples). Write application programs using the DBMS –Now much easier, with data management API

5 Slides adapted from Rao (ASU) & Franklin (Berkeley) ssn address namefield Professor Advises Takes Teaches Course Student namecategory quarter name Conceptual Modeling

6 Slides adapted from Rao (ASU) & Franklin (Berkeley) Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using a given data model. The relational model of data is the most widely used model today. –Main concept: relation, basically a table with rows and columns. –Every relation has a schema, which describes the columns, or fields.

7 Slides adapted from Rao (ASU) & Franklin (Berkeley) Levels of Abstraction Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. Physical Schema Conceptual Schema View 1View 2View 3 DB

8 Slides adapted from Rao (ASU) & Franklin (Berkeley) Example: University Database Conceptual schema: – Students(sid: string, name: string, login: string, age: integer, gpa:real) – Courses(cid: string, cname:string, credits:integer) External Schema (View): –Course_info(cid:string,enrollment:in teger) Physical schema: –Relations stored as unordered files. –Index on first column of Students. Physical Schema Conceptual Schema View 1View 2View 3 DB If five people are asked to come up with a schema for the data, what are the odds that they will come up with the same schema?

9 Slides adapted from Rao (ASU) & Franklin (Berkeley) Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. Q: Why are these particularly important for DBMS? Physical Schema Conceptual Schema View 1View 2View 3 DB

10 Slides adapted from Rao (ASU) & Franklin (Berkeley) Schema Design & Implementation Table Students Separates the logical view from the physical view of the data.

11 Slides adapted from Rao (ASU) & Franklin (Berkeley) Terminology tuples Attribute names Students (Arity=3)

12 Slides adapted from Rao (ASU) & Franklin (Berkeley) Querying a Database Find all the students taking CSE594 in Q1, 2004 S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CS490i and E.quarter=“Winter, 2000” Query processor figures out how to answer the query efficiently.

13 Slides adapted from Rao (ASU) & Franklin (Berkeley) Relational Algebra Operators –tuple sets as input, new set as output Basic Binary Set Operators –Result is table (set) with same attributes Sets must be compatible! –R1(A1,A2,A3)  R2(B1,B2,B3) –  Domain(Ai) = Domain(Bi) –Union All tuples in either R1 or in R2 –Intersection All tuples in both R1 and R2 –Difference All tuples in R1 but not in R2 –Complement All tuples not in R1 Selection, Projection, Cartesian Product, Join

14 Slides adapted from Rao (ASU) & Franklin (Berkeley) Selection  Grab a subset of the tuples in a relation that satisfy a given condition –Use and, or, not, >, <… to build condition Unary operation… returns set with same attributes, but ‘selects’ rows

15 Slides adapted from Rao (ASU) & Franklin (Berkeley) Employee SSNNameDepartmentIDSalary 999999999John130,000 777777777Tony132,000 888888888Alice245,000 Selection Example SSNNameDepartmentIDSalary 888888888Alice245,000 Select (Salary > 40000)

16 Slides adapted from Rao (ASU) & Franklin (Berkeley) Projection  Unary operation, selects columns Returned schema is different, –So returned tuples are not subset of original set –Contrast with selection Eliminates duplicate tuples

17 Slides adapted from Rao (ASU) & Franklin (Berkeley)

18 Cartesian Product X Binary Operation Result is set of tuples combining all elements of R1 with all elements of R2, for R1  R2 Schema is union of Schema(R1) & Schema(R2) Notice we could do selection on result to get meaningful info!

19 Slides adapted from Rao (ASU) & Franklin (Berkeley) Cartesian Product Example

20 Slides adapted from Rao (ASU) & Franklin (Berkeley) Join Most common (and exciting!) operator… Combines 2 relations –Selecting only related tuples Result has all attributes of the two relations Equivalent to –Cross product followed by selection followed by Projection Equijoin –Join condition is equality between two attributes Natural join –Equijoin on attributes of same name –result has only one copy of join condition attribute

21 Slides adapted from Rao (ASU) & Franklin (Berkeley) Example: Natural Join Employee Dependents

22 Slides adapted from Rao (ASU) & Franklin (Berkeley) Complex Queries Product ( pname, price, category, maker) Purchase (buyer, seller, store, prodname) Company (cname, stock price, country) Person( per-name, phone number, city) Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought

23 Slides adapted from Rao (ASU) & Franklin (Berkeley) Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, prodname) Company (cname, stock price, country) Person( per-name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

24 Slides adapted from Rao (ASU) & Franklin (Berkeley) SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: SQL92, SQL2, SQL3, SQL99 Vendors support various subsets of these (but we’ll only discuss a subset of what they support) Basic form = syntax on relational algebra (but many other features too) Select attributes From relations (possibly multiple, joined) Where conditions (selections)

25 Slides adapted from Rao (ASU) & Franklin (Berkeley) Selections  SELECT * FROM Company WHERE country=“USA” AND stockPrice > 50 You can use: Attribute names of the relation(s) used in the FROM. Comparison operators: =, <>,, = Apply arithmetic operations: stockprice*2 Operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.

26 Slides adapted from Rao (ASU) & Franklin (Berkeley) Projection  SELECT name AS company, stockprice AS price FROM Company WHERE country=“USA” AND stockPrice > 50 SELECT name, stock price FROM Company WHERE country=“USA” AND stockPrice > 50 Select only a subset of the attributes Rename the attributes in the resulting table

27 Slides adapted from Rao (ASU) & Franklin (Berkeley) Ordering the Results SELECT name, stock price FROM Company WHERE country=“USA” AND stockPrice > 50 ORDERBY country, name Ordering is ascending, unless you specify the DESC keyword. Ties are broken by the second attribute on the ORDERBY list, etc.

28 Slides adapted from Rao (ASU) & Franklin (Berkeley) Join SELECT name, store FROM Person, Purchase WHERE per-name=buyer AND city=“Seattle” AND product=“gizmo” Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city)

29 Slides adapted from Rao (ASU) & Franklin (Berkeley) Disambiguating Attributes SELECT Person.name FROM Person, Purchase, Product WHERE Person.name=buyer AND product=Product.name AND Product.category=“telephony” Product ( name, price, category, maker) Purchase (buyer, seller, store, product) Person( name, phone number, city) Find names of people buying telephony products:

30 Slides adapted from Rao (ASU) & Franklin (Berkeley) Tuple Variables SELECT product1.maker, product2.maker FROM Product AS product1, Product AS product2 WHERE product1.category = product2.category AND product1.maker <> product2.maker Product ( name, price, category, maker) Find pairs of companies making products in the same category

31 Slides adapted from Rao (ASU) & Franklin (Berkeley) Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock-price, country) Person( per-name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who live in Seattle and who bought American products. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

32 Views

33 Slides adapted from Rao (ASU) & Franklin (Berkeley) Defining Views (Virtual) Views are relations, except that they are not physically stored. They are used mostly in order to simplify complex queries and to define conceptually different views of the database to different classes of users. View: purchases of telephony products: CREATE VIEW telephony-purchases AS SELECT product, buyer, seller, store FROM Purchase, Product WHERE Purchase.product = Product.name AND Product.category = “telephony”

34 Slides adapted from Rao (ASU) & Franklin (Berkeley) A Different View CREATE VIEW Seattle-view AS SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = “Seattle” AND Person.name = Purchase.buyer We can later use the views: SELECT name, store FROM Seattle-view, Product WHERE Seattle-view.product = Product.name AND Product.category = “shoes” What’s really happening when we query a view??

35 Slides adapted from Rao (ASU) & Franklin (Berkeley) Updating Views How can I insert a tuple into a table that doesn’t exist? CREATE VIEW bon-purchase AS SELECT store, seller, product FROM Purchase WHERE store = “The Bon Marche” If we make the following insertion: INSERT INTO bon-purchase VALUES (“the Bon Marche”, Joe, “Denby Mug”) We can simply add a tuple (“the Bon Marche”, Joe, NULL, “Denby Mug”) to relation Purchase.

36 Slides adapted from Rao (ASU) & Franklin (Berkeley) Non-Updatable Views CREATE VIEW Seattle-view AS SELECT seller, product, store FROM Person, Purchase WHERE Person.city = “Seattle” AND Person.name = Purchase.buyer How can we add the following tuple to the view? (Joe, “Shoe Model 12345”, “Nine West”) Given Purchase (buyer, seller, store, product) Person( name, phone-num, city)

37 Slides adapted from Rao (ASU) & Franklin (Berkeley) Materialized Views Views whose corresponding queries have been executed and the data is stored in a separate database –Uses: Caching Issues –Using views in answering queries Normally, the views are available in addition to database – (so, views are local caches) In information integration, views may be the only things we have access to. –An internet source that specializes in woody allen movies can be seen as a view on a database of all movies. Except, there is no database out there which contains all movies.. –Maintaining consistency of materialized views

38 Slides adapted from Rao (ASU) & Franklin (Berkeley) Issues w.r.t. Databases on the Web Information Extraction (invert the tuple to text transformation) Support lay user queries –More flexible queries Exact (SQL) vs Approximate/Similar (Text search?) –On “semi-structured” databases Joins over text attributes? –Exact (SQL) vs Approximate/Similar !!!!! Support integration/aggregation of multiple databases –Take a query from the user and send it to all relevant databases… –TONS of challenges…

39 Slides adapted from Rao (ASU) & Franklin (Berkeley) Imprecise Queries Increasing number of Web accessible databases –E.g. bibliographies, reservation systems, department catalogs etc –Support for precise queries only – exactly matching tuples Difficulty in extracting desired information –Limited query capabilities provided by form based query interface –Lack of schema/domain information –Increasing complexity of types of data e.g. hyptertext, images etc Often times user wants ‘about the same’ instead of ‘exact’ –Bibliography search — find similar publications Solution: Provide answers closely matching query constraints

40 Query Optimization

41 Slides adapted from Rao (ASU) & Franklin (Berkeley) Query Optimization Imperative query execution plan: Declarative SQL query Ideally: Want to find best plan. Practically: Avoid worst plans! Goal: (Simple Nested Loops) Purchase Person Buyer=name City=‘seattle’ phone>’5430000’ buyer  (Table scan)(Index scan) SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Inputs: the query statistics about the data (indexes, cardinalities, selectivity factors) available memory

42 Slides adapted from Rao (ASU) & Franklin (Berkeley) Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) with pipelining ) (On-the-fly) SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Goal of optimization: To find more efficient plans that compute the same answer.

43 Slides adapted from Rao (ASU) & Franklin (Berkeley) Optimizing Joins Q(u,x) :- R(u,v), S(v,w), T(w,x) –R S T Many ways of doing a single join R S –Symmetric vs. asymmetric join operations Nested join, hash join, double pipe-lined hash join etc. –Processing costs alone vs. processing + transfer costs Get R and S together vs, get R, get just the tuples of S that will join with R (“semi-join”) Many orders in which to do the join –(R join S) join T –(S join R) join T –(T join S) join R etc. All with different costs

44 Slides adapted from Rao (ASU) & Franklin (Berkeley) Determining Join Order In principle, we need to consider all possible join orderings: As the number of joins increases, the number of alternative plans grows rapidly; we need to restrict the search space. System-R: consider only left-deep join trees. –Left-deep trees allow us to generate all fully pipelined plans:Intermediate results not written to temporary files. Not all left-deep trees are fully pipelined (e.g., SM join). B A C D B A C D C D B A

45 Slides adapted from Rao (ASU) & Franklin (Berkeley) Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks (corresponding to nested sub- queries or views). Query rewrite phase: –apply algebraic transformations to yield a cheaper plan. –Merge blocks and move predicates between blocks. Optimize each block: join ordering. Complete the optimization: select scheduling (pipelining strategy).

46 Slides adapted from Rao (ASU) & Franklin (Berkeley) Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must estimate size of result for each operation in tree! Use information about the input relations. For selections and joins, assume independence of predicates. System R cost estimation approach. –Very inexact, but works ok in practice. –More sophisticated techniques known now.

47 Slides adapted from Rao (ASU) & Franklin (Berkeley) Key Lessons in Optimization There are many approaches and many details to consider in query optimization –Classic search/optimization problem! –Not completely solved yet! Main points to take away are: –Algebraic rules and their use in transformations of queries. –Deciding on join ordering: System-R style (Selinger style) optimization. –Estimating cost of plans and sizes of intermediate results.

48 Slides adapted from Rao (ASU) & Franklin (Berkeley) Concurrency Control Concurrent execution of user programs: key to good DBMS performance. –Disk accesses frequent, pretty slow –Keep the CPU working on several programs concurrently. Interleaving actions of different programs: trouble! –e.g., account-transfer & print statement at same time DBMS ensures such problems don’t arise. –Users/programmers can pretend they are using a single-user system. (called “Isolation”) –Thank goodness! Don’t have to program “very, very carefully”.

49 Slides adapted from Rao (ASU) & Franklin (Berkeley) Transactions: ACID Properties Key concept is a transaction: a sequence of database actions (reads/writes). DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. Each transaction, executed completely, must take the DB between consistent states or must not run at all. DBMS ensures that concurrent transactions appear to run in isolation. DBMS ensures durability of committed Xacts even if system crashes. Note: can specify simple integrity constraints on the data. The DBMS enforces these. –Beyond this, the DBMS does not understand the semantics of the data. –Ensuring that a single transaction (run alone) preserves consistency is largely the user’s responsibility!

50 Slides adapted from Rao (ASU) & Franklin (Berkeley) Scheduling Concurrent Transactions DBMS ensures that execution of {T1,..., Tn} is equivalent to some serial execution T1’... Tn’. –Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are held until the end of the transaction. (Strict 2PL locking protocol.) –Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), … say Ti obtains the lock on X first … so Tj is forced to wait until Ti completes. This effectively orders the transactions. –What if … Tj already has a lock on Y … and Ti later requests a lock on Y? (Deadlock!) Ti or Tj is aborted and restarted!

51 Slides adapted from Rao (ASU) & Franklin (Berkeley) Ensuring Transaction Properites DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. DBMS ensures durability of committed Xacts even if system crashes. Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of Xacts: –Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol; OS support for this is often inadequate.) –After a crash, the effects of partially executed transactions are undone using the log. Effects of committed transactions are redone using the log. –trickier than it sounds!


Download ppt "Slides adapted from Rao (ASU) & Franklin (Berkeley) Structure How will search and querying on these three types of data differ? A generic web page containing."

Similar presentations


Ads by Google