## Presentation on theme: "Eftychia Baikousi Panos Vassiliadis"— Presentation transcript:

View Usability and Safety for the Answering of Top-k Queries via Materialized Views Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science

Forecast Problem of answering a top-k query through materialized top-n views Theoretical guarantees when a top-n materialized view can answer a top-k query Algorithmic techniques for answering a top-k query from a materialized view Properties of the safe areas of views DOLAP 2009, Hong Kong, 6 Nov 2009

Contents Motivation & Problem Definition Overview of the Method
Theoretical guarantees Strictness of theorem Safe area properties Experiments Conclusions Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009

Contents Motivation & Problem Definition Overview of the Method
Theoretical guarantees Strictness of theorem Safe area properties Experiments Conclusions Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009

Top-k query Find k tuples with highest grades according to Q Given
a relation R (id, x1, x2, x3) and a query Q, sum(x1, x2, x3) Find k tuples with highest grades according to Q R id x1 x2 x3 a 0.3 0.6 0.7 b 0.2 0.4 c 0.5 0.9 d 0.1 sum 1.6 0.9 1.8 1.4 Top-2 tuples DOLAP 2009, Hong Kong, 6 Nov 2009

Motivating Example Telecommunication Company Given a relation
Executives see sale reports in PDAs Given a relation Region (id, name, today_traffic, yesterday_traffic, budget, ..) a materialized view V of top-2 regions according to the query Q: 0.6*difftraffic + 0.4*budget V Region id Name t_traffic y_traffic budget V 1 LA 18 20 21 7.2 2 NY 42 54 15 -1.2 3 Dallas 26 22 8 4.4 4 Chicago 30 28 11 5.6 name V LA 7.2 Dallas 4.4 Can a new top-k query (e.g. 0.5*difftraffic + 0.3*budget) be answered from V ? DOLAP 2009, Hong Kong, 6 Nov 2009

Problem definition Given a base relation R (ID, X, Y)
a materialized view V (ID, X, Y, s) that contains top-n tuples of the form (id, s) where s is defined as s = w (a·x + y) and w, a are positive parameters a query Q (ID, X, Y, sQ ) that requests for top k ≤ n tuples of the form (id, sQ) where sQ is defined as sQ = wQ (aQ·x + y) and wQ, aQ are positive parameters Introduce an algorithm that decides whether V by itself is suitable to answer Q and compute Q’s answer DOLAP 2009, Hong Kong, 6 Nov 2009

Related Work “Answering Top-k Queries Using Views”, VLDB ’06
Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis : “Answering Top-k Queries Using Views”, VLDB ’06 Answer top-k query Q by making use of ranking views V LPTA in 2-steps SelectViews (V, Q) Selects efficient subset of views U for answering Q, U contains the sorted lists over each attribute of the relation Answer Q from U Linear programming adaptation of TA algorithm Stopping condition : solution of linear program ≤ min (top-k) DOLAP 2009, Hong Kong, 6 Nov 2009

Related Work – Geometric Representation (0)
Assume Relation R (ID, X, Y) Two views Vu( id, Score1) and Vd( id, Score2) Query Q( id, Score) Scoring functions of the form Score = w ( a·x +y) Depicted as y = a-1·x DOLAP 2009, Hong Kong, 6 Nov 2009

Related Work – Geometric Representation (1)
M : the kth tuple in Q Stopping condition: sweeping line ( ) crosses position A1B Any point below line AB has smaller score than M in regards to Q DOLAP 2009, Hong Kong, 6 Nov 2009

Related Work – Geometric Representation (2)
Stopping condition: intersection point S of sweeping lines ( , ) lies on line AB Any point below line AB has smaller score than M in regards to Q DOLAP 2009, Hong Kong, 6 Nov 2009

Related Work SelectViews (V,Q) is Data dependant
based on estimation of the last tuple of Q according to the data distribution No theoretically established guarantees that the set of views will answer Q DOLAP 2009, Hong Kong, 6 Nov 2009

Contents Motivation & Problem Definition Overview of the Method
Theoretical guarantees Strictness of theorem Safe area properties Experiments Conclusions Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009

Overview of the method Theoretical guarantees of Answering a query Q via a view VU Theoretical guarantees are too strict Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009

Example V top-3 with score x+2y Q top-1 with score 2x+y R id x y V a 7
4 15 b 2 16 c 8 d 1 3 Q 18 11 10 3 DOLAP 2009, Hong Kong, 6 Nov 2009

Construction of safe area
VU(ID, X, Y, sU) Containing top n tuples with score sU=wU(aU·x+y) tN the nth tuple in VU LU :xNUyNU line perpendicular to VU passing from tN and meeting axes X and Y LQ:xNUyQ line perpendicular to Q passing from xNU DOLAP 2009, Hong Kong, 6 Nov 2009

Safe area Safe area defined as the area “above” line LQ (shaded area)
Observations Any tuple in safe area has score (in regards to Q) higher than any tuple outside the safe area Tuples in safe area belong in both VU and Q DOLAP 2009, Hong Kong, 6 Nov 2009

Answering Q from VU THEOREM 1
VU can answer Q if safe area contains at least k tuples Inverse does not always hold DOLAP 2009, Hong Kong, 6 Nov 2009

Overview of the method Theoretical guarantees of Answering a query Q via a view VU Theoretical guarantees are too strict Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009

THEOREM 2 It is possible that VU can answer Q if safe area contains less than k tuples This holds when: area defined by (yellow triangle) line LU, X-axis and line L1 producing the lowest possible score for Q from tuples of VU Is void of tuples DOLAP 2009, Hong Kong, 6 Nov 2009

Algorithm TestViewSuitability
Three main steps Step 1: Compute safe area (Q, V) Step 2: Count tuples in V that belong in the safe area Step 3: If there are more than k, then return (true) Else return (false) DOLAP 2009, Hong Kong, 6 Nov 2009

Overview of the method Theoretical guarantees of Answering a query Q via a view VU Theoretical guarantees are too strict Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009

Combining two views Lines LQU , LQD Q
characterizing the safe areas for VU and VD LQU ║ LQD safe area of one view (VU ) encompassed in safe area of the other view (VD) DOLAP 2009, Hong Kong, 6 Nov 2009

Combining two views THEOREM 3
If two views are not safe for answering Q by themselves, then the combination of them cannot safely guarantee the answer to Q, in regards to the safe areas. DOLAP 2009, Hong Kong, 6 Nov 2009

Contents Motivation & Problem Definition Overview of the Method
Theoretical guarantees Strictness of theorem Safe area properties Experiments Conclusions Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009

Experimental methodology
Test the following methods Our algorithm TA algorithm (it can guarantee view usability correctness) For the following goals Effectiveness Number of queries answered by views Efficiency Time savings from usage of queries DOLAP 2009, Hong Kong, 6 Nov 2009

Experimental methodology
Experimental parameters: Size of source table R (tuples) |R| 1x104, 5x104, 1x105 Max size of mat. View (tuples) k 10, 50, 100, 500, 1000 Number of queries asked |Q| 100, 1000 Synthetic data sets: Random data sets of different sizes for a relation of the form R (ID, X, Y) Sequence of queries with random coefficients and result size k DOLAP 2009, Hong Kong, 6 Nov 2009

Effectiveness Percentage of views used for 100 queries
DOLAP 2009, Hong Kong, 6 Nov 2009

Effectiveness Percentage of views used for different time spans
DOLAP 2009, Hong Kong, 6 Nov 2009

Efficiency Time savings from the usage of queries for different database sizes and requested results Conflicting case The number of stored results rises, while the savings drop Due to the size of used memory Memory allocation becomes slow Probably one view is able to answer lot of queries Savings increase for reasonable k’s of size 0.1% DOLAP 2009, Hong Kong, 6 Nov 2009

Contents Motivation & Problem Definition Overview of the Method
Theoretical guarantees Strictness of theorem Safe area properties Experiments Conclusions Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009

Conclusions We have provided theoretical and algorithmic results for the problem of answering top-k queries via materialized views Theoretical – algorithmic results: Theorem1: Theoretical guarantees for a view to answer a top-k query, Theorem2: Strictness of Theorem1 Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009

Contents Motivation & Problem Definition Overview of the Method
Theoretical guarantees Strictness of theorem Safe area properties Experiments Conclusions Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009

Future Work Optimization in case of time and storage constraints
View Caching Hierarchical structures for the set of views Sorting techniques DOLAP 2009, Hong Kong, 6 Nov 2009