# PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.

## Presentation on theme: "PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T."— Presentation transcript:

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T Research Yannis Papakonstantinou University of California, San Diego

Example

ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price Problem: Retrieve WHOLE relation

Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price Problem: Retrieve WHOLE relation PREFER retrieves only part of relation

Applications Such preference queries are used in Web sites like: www.Zagat.com ( restaurants)www.Zagat.com www.personallogic.com (online retailer)www.personallogic.com

Definitions - Problem statement A preference query orders the tuples of a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples

Our Approach PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.

Our Approach Ranked view 0.08*Price + 0.2*Year 0.08 0.2 Price Year Ranked view 0.075*Price + 0.8*Year

Our Approach Ranked view 0.08*Price + 0.2*Year 0.08 0.2 Price Year Preference query: 0.07*Price + 0.35*Year 0.07 0.35 Ranked view 0.075*Price + 0.8*Year

Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? PREFER Architecture Views Creation Preprocessing stage

View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results Runtime Process Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation How to use a preference view to answer to a preference query

View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results Runtime Process How to use a preference view to answer to a preference query Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation

t1t1 Watermark = 14.26 Car ID...Doorsfqfq Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price last tuple Watermark

Calculating the Watermark Watermark

Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price t1t1 1.Calculate Watermark for t 1, which is 14.26 Car ID

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q Car ID

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q Car ID

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 Car ID 2 1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 13.1 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 13.1 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 3

How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 1.Calculate Watermark for t 1, which is 8.3 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 3

Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price t1t1 1.Calculate Watermark for t 1, which is 8.3 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 3 5 4

View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results How to use a preference view to answer to a preference query Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation Runtime Process

Define coverage 0.8 0.2 Year Price Ranked view 0.8*Price + 0.2*Year V1V1 q1q1 Preference query: 0.7*Price + 0.35*Year 0.7 0.35 V 1 covers q 1 : At most k tuples are retrieved from V 1 in order to output first result of q 1.

Which ranked view should we use to answer to a specific preference query? Ranked view 0.8*Price + 0.2*Year 0.8 0.2 Price Year Ranked view 0.75*Price + 0.8*Year

Ranked view 0.8*Price + 0.2*Year 0.8 0.2 Price Year Ranked view 0.75*Price + 0.8*Year Which ranked view should we use to answer to a specific preference query?

Ranked view 0.8*Price + 0.2*Year 0.8 0.2 Price Year Preference query: 0.7*Price + 0.35*Year 0.7 0.35 Ranked view 0.75*Price + 0.8*Year V 1 covers q 1 Which ranked view should we use to answer to a specific preference query? V1V1 q1q1

View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results How to use a preference view to answer to a preference query Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation Runtime Process

Which ranked views should we materialize? ViewSelection Algorithm                     while (not all preference vectors in [0,1] n covered) Randomly pick v  [0,1] n and add it to the list of views L VIEWS  for i = 1 to C do select v  L that covers the maximum number of uncovered vectors in [0,1] n VIEWS  VIEWS  v

Which ranked views should we materialize? (cont’d) ViewSelection Algorithm                     while (not all preference vectors in [0,1] n covered) Randomly pick v  [0,1] n and add it to the list of views L VIEWS  for i = 1 to C do select v  L that covers the maximum number of uncovered vectors in [0,1] n VIEWS  VIEWS  v                   

Which ranked views should we materialize? (cont’d) ViewSelection Algorithm                     while (not all preference vectors in [0,1] n covered) Randomly pick v  [0,1] n and add it to the list of views L VIEWS  for i = 1 to C do select v  L that covers the maximum number of uncovered vectors in [0,1] n VIEWS  VIEWS  v                   C = 3

Constraint on # of views  Maximum coverage problem using the minimum # of materialized views is NP- Hard.  Greedy Heuristic is approximation for maximum coverage.

Related Work Preference Query Framework [AW00] Top-k queries –Joins Fagin [F99,F96,F01], equijoins of ordered data –Selections [reduce top-k selection to range query] Histograms to estimate cutoff [Chaudhuri&Gravano 99] Probabilistic model [Donjerkovic&Ramakrishnan 99] Partitioning [Carey & Kossman 97,98]

Related Work The Onion Technique (Sigmod 2000). Main observation: the points of interest lie on the convex hull of the tuple space. Drawbacks of Onion: Does not scale Computing the convex hull is very computationally intensive Not efficient if the domain of an attribute has a small cardinality Not efficient for more than the top-1 result

Experiments Measured parameters # attributes size of relation # views constraint on max # tuples retrieved

Parameters of Experiments synthetic datasets 3 to 5 attributes 10,000 to 500,000 tuples random & correlated data discretization of 0.1 or 0.05

Experiments (cont’d) Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views

Experiments (cont’d) 4 attr, constraint = 500 tuples, discretization = 0.1

Experiments (cont’d) 500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1

Experiments (cont’d) 4 attr, discretization = 0.1

Experiments (cont’d) 4 attr, discretization = 0.1

Experiments (cont’d) 50,000 tuples, 3 attr, discretization = 0.05

More Resources www.db.ucsd.edu/PREFER PREFER demo PREFER Application –Construct Materialized Views –Issue preference queries MS Windows, on top of Oracle DBMS

Conclusions Methodology to efficiently answer to top-K linearly weighted queries Algorithm that uses a ranked view to answer to a preference query Ranked materialized views were used Experimental evaluation

Download ppt "PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T."

Similar presentations