Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS848 Presentation Heng YU (Henry)

Similar presentations


Presentation on theme: "CS848 Presentation Heng YU (Henry)"— Presentation transcript:

1 CS848 Presentation Heng YU (Henry) h3yu@hopper.uwaterloo.ca

2 Paper to present Answering queries using views: A survey by A. Y. Halevy VLDB Journal 10: pp. 270-294 2001

3 Outline Introduction with examples Formal problem definitions Conditions of view usability Using materialized views in query optimization Answering queries using views in data integration Theoretical results Extensions Conclusion and challenges

4 Introduction

5 Problems (informal) Given a query Q and a set of views V 1,.., V n over a database schema, Is it possible to answer Q using only the answers to V 1,.., V n ? What is the maximal set of tuples in the answer of Q that we can get from V 1,.., V n ? If we can access both the views and the database relations, what is the cheapest query execution plan for answering Q?

6 Fields of applications Query optimization Physical data independence Data integration More: e.g. semantic cache

7 Example: a university schema Prof(name, area) Course(c-number, title) Teaches(prof, c-number, quarter) Registered(student, c-number, quarter) Major(student, dept) Works(prof, dept) Advises(prof, student) Keys: Prof(name) Courses(c-cumber) graduate course c-cumber ≥ 400 Ph.D. course c-cumber ≥ 500

8 Query Optimization Suppose we have a view for graduate course registration info: create view Graduate as selectRegistered.student, Course.title, Course.c-cnumber, Registered.quarter fromRegistered.course whereRegistered.c-number = Course.c-number and Course.c-number ≥ 400

9 Want to query students registering in Ph.D. level courses taught by a professor who in interested in DB area: selectRegistered.student, Course.title fromTeaches, Prof, Regestered, Course whereProf.name = Teaches.prof and Teaches.c-number = Register.c-number and Teachers.quarter = Registered.quarter and Registered.c-number = Course.c-number and Course.c-number ≥ 500 and Prof.area = ‘DB’ Query optimization (cont.)

10 Query selectRegistered.student, Course.title fromTeaches, Prof, Registered, Course whereProf.name = Teaches.prof and Teaches.c-number = Register.c-number and Teachers.quarter = Registered.quarter and Registered.c-number = Course.c-number and Course.c-number ≥ 500 and Prof.area = ‘DB’ View create view Graduate as selectRegistered.student, Course.title, Course.c-cnumber, Registered.quarter fromRegistered. Course whereRegistered.c-number = Course.c-number and Course.c-number ≥ 400

11 Query optimization (cont.) Result of query rewriting select Graduate.student, Graduate.title fromTeachers, Prof, Graduate whereProf.name = Teachers.prof and Teaches.c-number = Graduate.c-cumber and Teaches.quarter = Graduate.quarter and Graduate.c-number ≥ 500 and Prof.area = ‘DB’

12 Maintaining physical data independence Relational database systems rely on 1-1 mapping between relations and files. In object-oriented and semistructured databases, logical model is more redundant and does not reflect optimal physical design. Physical storage can be described as views over the logical model. e.g. GMAP (Tsatalos et al. 96)

13 Maintaining physical data independence (cont.) GMAP (generalized multi-level access paths) def.gmap G1 as b + -tree by given Student.name select Department where Student.major Department. def.gmap G2 as b + -tree by given Student.name select Course.c-number where Student registered Course def.gmap G3 as b + -tree by given Course.c-number select Department where Student.registered Course and Student major Department

14 Maintaining physical data independence (cont.) Query: select Student.name, Department where Student registered Course and Student major Department and Course.c-number ≥ 500 Plans: 1.P Student.name, Department (S Course.c-number≥500 (J Student.name (G1, G2))) 2.J Course.c-number (S Course.c-number≥50 (G3), G2)

15 Data integration Providing a uniform query interface to a multitude of autonomous heterogeneous data sources. Giving users a mediated schema. Local as View: specifying data source descriptions as a view over the mediated schema.

16 Data integration (cont.) Example: Prof(name, area) Course(c-number, title, univ) Teaches(prof, c-number, quarter, univ) Register(student, c-number, quarter) Major(student, dept) Works(prof, dept) Advises(prof, student)

17 Data integration (cont.) Suppose we have only 2 views available: create view DB-courses as select Course.title, Teaches.prof, Course.c-number, Course.univ fromTeaches, Course whereTeaches.c-number = Course.c-number and Teaches.univ = Course.univ and Course.title = “Database Systems” create view UW-phd-courses as select Course.title, Teaches.prof, Course.c-number,Course.univ fromTeaches, Course where Teaches.c-number = Course.c-number and Course.univ = ‘UW’ and Teaches.univ = ‘UW’ and Course.c-number ≥ 500

18 Data integration (cont.) Query who teaches database courses in UW: select prof from DB-courses where univ = ‘UW’ Query all graduate courses in UW: select title, c-number from DB-courses where univ = ‘UW’ and c-number ≥ 400 UNION select title, c-number from UW-phd-courses

19 Comparison for two applications Query Optimization and physical design Data Integration OutputQuery execution plan Q’ Query Q’ Equivalence with Q Q’ must be equivalent to Q Q’ can be equivalent to or contained in Q Data accessed Original relational data + materialized views Only views # of viewsModestHuge View completeness YesNo Rewriting reasoning Logical correctness + cost model Logical correctness

20 Formal Problem Definition

21 Query containment and equivalence Definition A query Q 1 is said to be contained in a query Q 2, denoted by Q 1 Q 2, if for all database instances D, the set of tuples computed for Q 1 is a subset of those computed for Q 2, i.e., Q 1 (D) Q 2 (D) ; The two queries are equivalent if Q 1 Q 2 and Q 2 Q 1.    

22 Equivalent rewritings Definition Let Q be a query, V = {V 1, V 2, …, V m } be a set of view definitions. The query Q’ is an equivalent rewriting of Q using V if: Q’ refers only to the views in V; Q’ is equivalent to Q.A query Q 1 is said to be contained in a query Q 2,

23 Maximally-contained rewritings Definition Let Q be a query, V = {V 1, V 2, …, V m } be a set of view definitions, and L be a query language. The query Q’ is maximally-contained rewriting of Q’ w.r.t. L if: Q’ is a query in L that refers only to the views in V; Q’ is contained in Q; there is no rewriting Q 1 L, such that Q’ Q 1 Q, and Q 1 is not equivalent to Q’. 

24 Certain Answers Problem: finding all the answers to a query given a set of views. Not equivalent to maximally-contained rewriting because Maximal containment relies on languages. Formalized by certain answers (Abiteboul et.al. 98) A tuple α is a certain answer of Q w.r.t. a set of view definitions {V i } and their extensions {v i }, if α is in Q(D) for any possible database instance D such that V i (D) = v i (CWA) or V i (D) v i (OWA).

25 Conditions of view usability

26 View usability conditions For SPJ views to be usable in an equivalent rewriting of a SPJ query Q under bag semantics: 1.There is a mapping ψ from occurrences of tables mentioned in the from clause of V to those mentioned in the from clause of Q, mapping every table name to itself. For bag semantics, ψ must be 1-1. 2. V must either apply the join and selection predicates in Q on the attributes on the attributes of the tables in the domain of ψ, or must apply to them a logical weaker selection, and select the attributes on which predicate still need to applied. 3. V must not project out any attributes of the tables in the domain of ψ that are needed in the selection of Q.

27 Using materialized view in query optimization

28 System-R style optimization Traditional optimizerOptimizer using views Single table access path Access paths on all tables Also consider usable materialized views Combining partial plans The predicates of the two partial plans are known, and the cheapest is considered. Consider joining partial plans with several alternative join predicates. Pruning of plans Save the cheapest of each equivalence class Compares any pairs of plans, and discard one if there is another cheaper plan dominates it. Termination testing Has the equivalent class including all relations in the query been considered? Are all partial plans examined?

29 System-R style (cont.)

30 Queries with grouping and aggregation Example: View: create view V as select c-number, year, Max(evaluation) as maxeval, Count(*) as offerings fromTeaches wherec-number ≥ 400 group by c-number, year Query: select year, Count(*), Max(evaluation) fromTeaches wherec-number ≥ 500 group by year

31 Queries with grouping and aggregation (cont.) The query can be rewritten to: select year, sum(offering), Max(evaluation) FromV wherec-number ≥ 500 group by year Comment: More limitations if grouping and aggregation are concerned. Grouping in view must be finer than that in query. Aggregations in query must be recoverable from the output fields and aggregations in the view.

32 Answering queries using views for data integration

33 Main approches Using datalog query representation for both Q and V. Algorithms: –Bucket algorithm (Levy et al. 96) –Inverse rules algorithm (Qian et al. 96 ) –MiniCon algorithm (Pottinger et al. 00)

34 Bucket algorithm Create a bucket for each non-comparison subgoal g in Q: For each subgoal g’ in V, if there is a unifier θ for g and g’ and the view, and after unification, 1)the comparison predicates in Q and V are simultaneously satisiable; 2)if a variable appears in head(Q) and subgoal g in the query, the corresponding variable in g’ also appears in head(V) in V, add θ(head(V)) into the bucket of g. Find a set of conjunctive query rewritings, and each produces a conjunctive query including one conjunct from each bucket. It is a conjunctive rewriting if either 1)The conjunctive is contained in Q, or 2)It is possible to add atoms of comparison predicates such that the resulting conjunction is contained in Q.

35 Bucket algorithm example V1(student, c-number, quarter, title):- Registered(student, c-number, quarter), Course(c-number, title), c-number ≥ 500, quarter ≥ Aut98. V2(student, prof, c-number, quarter):- Registered(student, c-number, quarter), Teaches(prof, c-number, quarter) V3(student, c-number):- Registered(student, c-number, quarter), quarter ≤ Aut94. V4(prof, c-number, title, quarter):- Registered(student, c-number, quarter), Course(c-number, title), Teaches(prof, c-number, quarter), quarter ≤ Aut97.

36 Bucket algorithm example (cont.) Query: Q(S,C,P) :- Teaches(P,C,Q), Registered(S,C,Q), Course(C,T), C ≥ 300, Q ≥ Aut95. Bucket: Teaches(P,C,Q)Registered(S,C,Q)Course(C,T) V2(S’,P,C,Q)V1(S,C,Q,T’)V1(S’,C,Q’,T) V4(P,C,T’,Q)V2(S,P’,C,Q)V4(P’,C,T,Q’)

37 Bucket algorithm example (cont.) Result of rewriting: q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’) q’(S,C,P) :- V4(P,C,T’,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) q’(S,C,P) :- V2(S,P,C,Q), V4(P,C,T’,Q) The second query is empty, so the result is the union of the first and the third conjunctive queries.

38 Bucket algorithm comments Advantage –Prune significant number of query rewritings. –Return maximally-contained rewriting when the query does not have comparison. Disadvantage –Cartesian product of buckets is still large –Testing query containment is costly and -complete.

39 Inverse-rules algorithm Construct a set of rules that invert the view definitions. Idea: each tuple in the head of view definition query is a witness of tuples in relations corresponding to subgoals in the body. Assign one skelom function symbol for each existential variable in the view definition.

40 Inverse-rules algorithm example Example: View definition: V3(dept, c-number) :- Major(student, dept), Registered(student, c-number) Inverse rules: Major(f 1 (dept, X), dept) :- V3(dept, X) Registered(f 1 (Y, c-number), c-number) :- V3(Y, c-number)

41 Inverse-rule algorithm example (cont.) Query: q(dept) :- Major(student, dept), Registered(student, 444) V3 has tuples: {(CS, 444), (EE, 444), (CS 333)} Applying inverse rules: Registered: {(f 1 (CS, 444), CS), (f 1 (EE, 444), EE), (f 1 (CS, 333), CS)} Major: {(f 1 (CS, 444), 444), (f 1 (EE, 444), 444), (f 1 (CS, 333), 333)} Answer: {EE, CS}

42 Inverse-rule algorithm comments Advantage –Simplicity and modularity –Return maximally-contained rewriting Disadvantage –Keep more non-contributive views than bucket algorithm –Require recomputing the relations from the views. The reason to use precomputed materialized views is lost.

43 MiniCon algorithm Improvement on bucket algorithm. Aim to eliminate more views that are useless to the query. When we find a unification between a subgoal g’ in V and a subgoal g in Q, all other subgoals that join with g in Q are examined. V must either have the join attribute in its head, or contain the corresponding joined subgoals in the body. For each view, compute a MiniCon consisting all subgoals in the query the view contributes.

44 MiniCon example Example: q(D) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) V1(dept) :- Major(student, dept), Registered(student, 444, quarter). V2(prof,dept,area) :- Advises(prof, student), Prof(name, area). V3(dep,c-number) :- Major(student, dept), Registered(student, c-number, quarter) Advises(prof, student) MiniCon(V 1 ) = Φ, MiniCon(V 2 ) = Φ, MiniCon(V 3 ) = {Major, Registered, Advises}

45 Theoretical results (very selective)

46 Completeness Question: given a query Q and a set of views V, will the algorithm find an equivalent rewriting of Q using V, when there one exists? When a CQ has no comparison predicates and has n subgoals, there exists an equivalent conjunctive rewriting of Q using V only if there is a rewriting with at most n subgoals. The complexity is NP-hard. (Levy et al. 1995)

47 Recursive rewriting Goal: when we apply maximally-contained rewriting, we can also get the set of all certain answers. Recursive query rewriting is necessary when: –The query is recursive. –Database relations have functional dependencies. –There exist access pattern limitations on the views. –Views have unions. –Additional semantic information about class hierarchies on objects is expressed in DL.

48 Recursive rewriting (example with fd) Relation: schedule(Airline, Flight_no, Date, Pilot, Aircraft) FDs: Pilot -> Airline, Aircraft->Airline View: V(D,P,C) :- schedule(A, N, D, P, C) Query: Q(P) :- schedule(A, N, D, ‘mike’, C), schedule(A, N’, D’, P, C’) Rewriting: relevantPilot(‘mike’) relevantAircraft(C) :- v(D, ‘mike’, C) relevantAircraft(C) :- v(D, P, C), relevantPilot(P) relevantPilot(P) :- relevantPilot(P1), relevantAircraft(C), v(D1, P1, C), v(D2, P, C)

49 Finding certain answers Open-world assumption: polynomial in most practical cases. NP-hard (in the size of view extensions) if unions are allowed in view definition or inequality predicates are allowed in query languages. Close-world assumption: co-NP-hard even if both views and queries are CQs and have no comparison predicates. c.f.GAV: polynomial In cases views can contain incorrect tuples : –assume no comparison predicates in views or query –If all views are complete or all views may have incorrect tuples: ploynomial in view ext. size –o.w.: co-NP-hard

50 Extensions

51 Object query languages (OQL) (Florescu 96) –more semantic info for class hierarchy and attributes –OQL does not clearly separate select and where clauses, both can have path navigation. Access pattern limitation (Rajaraman 95) –Restricted parameterized queries on views CitationDB bf (X,Y) :- Cites(X,Y) –Finite rewriting requires recursiveness.

52 Conclusion and challenges

53 Answering queries using views plays significant roles in query optimization, physical data independency, and data integration. New fields to explore: –Consider new query languages –Consider integration constraints –Bridge the gap between query optimization and data integration –Facilitate data warehouse query: query result reuse, incremental computation, –Decide which views are materialized first.

54 Thank you


Download ppt "CS848 Presentation Heng YU (Henry)"

Similar presentations


Ads by Google