Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Specific Ranking

Similar presentations


Presentation on theme: "Query Specific Ranking"— Presentation transcript:

1 Query Specific Ranking
CSE 6392 02/27/2006 Database Exploration

2 Content Comparison of FA and TA algorithm
Representing ranking problem as a geometric problem Query Specific Ranking Database Exploration

3 Comparison between FA and TA algorithm
TA is faster than FA TA stops as soon as the score of the hypothetical tuple is less than the score of tuples in the top-k buffer. TA is a bounded buffer algorithm TA maintains a top-k buffer FA maintains a set of candidates of all the tuples read until it gets ‘k’ objects in common in these sets. Database Exploration

4 Comparison between FA and TA
TA has to immediately scan as it reads a tuple in order to find the score in an eager manner. FA has 2 phases for calculating score: - sort phase - scan phase TA and FA algorithm requires the scoring function to be monotonic. Database Exploration

5 Why does TA work? Stopping condition for TA is:
Score (hypothetical tuple) < score (k-th tuple in top-k buffer) Idea is that score of unseen tuples will be less that the score of the hypothetical tuple according to the monotonic property. Database Exploration

6 Closing points on TA and FA
FA algorithm stops only when we get ‘k’ common objects/intersections in the set of candidates. TA algorithm makes assumptions of unseen tuples based on the score of the hypothetical tuple in order to stop. Therefore, there is no way FA can stop earlier than TA. Hence, TA is instance optimal. Database Exploration

7 Query Specific Ranking
The ranking function we have discussed so far depends on the assumption of total ordering of attributes. E.g. total ordering of price: - high price is bad - low price is good In reality, this is not always true. Database Exploration

8 Query Specific Ranking
Different people will have a different ideal price in mind. E.g. for one person, an ideal restaurant will be: price = $20 and capacity = 100. In this case, the ranking function can be: Score(<P, C>) = 5*|20-p| + 10*|100-c| Database Exploration

9 Query Specific Ranking
The above ranking function is more realistic than total ranking function. But the above ranking function is not monotonic. How can we find the top-k restaurants in this case without looking at the whole data set? Database Exploration

10 Solution Assume the data set is sorted on all the attributes of interest. First, create transformed attributes based on the original attributes involved in the ranking function such that the transformed attributes maintains the monotonic property. Secondly, simulate sorted access. Database Exploration

11 Transformed attributes
Consider the restaurant example where: Score(<P, C>) = 5*|20-p| + 10*|100-c| Transformed attributes are: ∆p = differential of price from original price ∆c = differential of capacity from original capacity Suppose tid1 = <$30, 120> then < ∆p, ∆c>=<10,20> tid2 = <$15, 85> then < ∆p, ∆c>=<5, 15> Database Exploration

12 Simulating sorted access
Achieving monotonicity is just part of the problem. Need to achieve sorted access on the transformed (∆p and ∆c) attributes. Suppose if data is presorted on the ‘price’ attribute. Without presorting the whole dataset, we can go directly to the ‘sweet spot’ (i.e. price = $20 & capacity = 100) using B+ tree index. From this point do 2 walks in the opposite directions and find ∆p and ∆c in the sorted order and merge them. Database Exploration

13 Adding Selection This explains how hard conditions are handled or added to a ranking function. E.g. Look for restaurants in Arlington location =“Arlington”  hard condition Database Exploration

14 Handling hard conditions
The query will look like this: Select top[10] From restaurants Where location = “Arlington” Order by 5*abs(120 - price) How to solve this query? Database Exploration

15 Handling hard conditions
Do selection first, then do ranking This method is not the best method for the following reasons: If selection produces a big result, it defeats the purpose of doing ranking If selection produces a small result, then doing ranking on it will be an overkill. The raw data is presorted and doing a selection first on this raw data will destroy the order of tuples. TA requires data to be presorted. Database Exploration

16 Handling hard conditions
The second method is to integrate selection as part of ranking. Score (<L,P,C>) = If L= “Arlington” then 5*|20-P| + 10*|100-C| else 0 Database Exploration

17 Handling hard conditions
Now we are no longer dealing with numeric values alone. Since location = “Arlington”, ranking function is no longer on numeric data but is instead on characterical data. How do we deal with ranking function that have characterical data? Database Exploration


Download ppt "Query Specific Ranking"

Similar presentations


Ads by Google