Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,

Similar presentations


Presentation on theme: "Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,"— Presentation transcript:

1 Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar, Caracas, Venezuela {mgoncalves,mvidal}@usb.ve Universidad Simón Bolívar

2 Page  2 Motivating Example «There are two Open Faculty Positions» « Candidates will be evaluated in terms of: Degree, Publications, Experience » « Criteria to select the best Candidates : higher academic degree, maximum number of publications and maximum years of experience » « Ties will be broken by using the GPA » Solutions: Skyline and Top-k

3 Page  3 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA Query: Candidates with the best academic degree, number of publications and experience Answer: None of the candidates is better in all criteria simultaneous. Motivation 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 23.5 7BEng 5 14

4 Page  4 4 Skyline Query: Select the candidates with better degree, number of publications and experience 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 33.5 7BEng 5 14 User Criteria (Equally Important!) Degree Maximum Publications Maximum Multicriteria Function Experience Maximum Skyline selects candidates 1,2,3 and 4. i.e., multi-criteria induce a partial order, and ties need to be broken

5 Page  5 Top-k Select two candidates with the best GPA 1 Post Dr 92 3.75 3PhD 12 23.75 IdDegreePublicationsExperienceGPA 5BEng 7 34.5 2Post Dr 1014 7BEng 5 14 4 MsC13 43.6 6BEng 6 33.5 Top-k identifies candidates 5 and 2, but these candidates have not the best academic merit necessarily User Criteria (Score Function!) GPA Maximum

6 Page  6 Preference based Queries  Select two candidates with higher GPA between the candidates with better degree, number of publications and Experience. –Cases: Skyline produces the candidates with better degree, number of publications and Experience –Skyline may be very huge and a post-processing over the Skyline is required to select k. Top-k identifies the two candidates with better GPA –False answers –Loss of results Top-k selects two candidates with good GPA Skyline selects four candidates in equality of conditions So… A combined approach is required!!

7 Page  7 Answer: The two candidates with the highest value in score function between the candidates preselected in terms of multicriteria function` Top-k Skyline Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 33.5 7BEng 5 14 Top-k Skyline Top-k Skyline Top-k Skyline selects candidates 1 and 2 with the highest GPAs among the ones with similar academic records

8 Page  8 Outline  Related Work  Our Approach  Top-k Skyline Evaluation  Experimental Study  Conclusions and Future Work

9 Page  9 Poor Ranking Capabilities Multi-criteria-based approaches Score-based Approaches SKYLINE High Ranking capabilities Combined Approaches BNL, SFS, LESS Top-k Top-k Skyline MPro, Upper, TA, FA, NRA. BMORTKS, BDTKS Metrics: Skyline Frequency Related Work Answers can be huge! Answers may be incomplete Neither Skyline nor Top-k provides high expressivity and high ranking capabilities. Existing Techniques of Top-k Skyline completely build the Skyline. Techniques to efficiently evaluate ranking approaches are required.

10 Page  10 Our Challenge Efficient Implementation of Top−k Skyline operator: Build the Top-k Skyline set minimizing the non-necessary probes.  A probe p of functions m or f is necessary if and only if p is evaluated on an object o that belongs to the Top-k Skyline. 4 MsC13 43.6 5BEng 7 34.5 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 2Post Dr 1014 3PhD 12 23.75 6BEng 6 33.5 7BEng 5 14 Non-Necessary Probes (Evaluations of multi-criteria or score function)! Goal: Only identify the elements of the Skyline that belongs to the answer

11 Page  11Pagina Top-k Skyline Evaluation  Indexed Solutions –BDTKS (Basic Distributed Top-k Skyline) –BMORTKS (Basic Multi-Objective Retrieval for Top-k Skyline) –TKSI (Top-K SkyIndex)

12 Page  12  BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience. 5757 4 13 IdPublications 1 9 210 312 6 7575 4 5353 IdExperience 1 2 2121 3232 6363 7171 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng Final Object! Index 1Index 2 Index 3

13 Page  13 2Post Dr 1014  BDTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and Experience 4 MsC13 43.6 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 3PhD 12 23.75 Partial Scanning of database (the final object is found) But, BDTKS completely builds the Skyline.

14 Page  14  BMORTKS Top-k Skyline Evaluation Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience. 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng 5757 4 13 IdPublications 1 9 210 312 6 7575 4 5353 IdExperience 1 2 2121 3232 6363 7171 PostDr,?,?PostDr,13,4PostDr,13,?PostDr,12,4 PhD,12,3PostDr,12,3 PostDr,13,4 PhD,10,3 MsC,10,3 MsC,9,3 Virtual (Last score seen): Index 1Index 2 Index 3

15 Page  15 2Post Dr 1014  BMORTKS Top-k Skyline Evaluation Query: Select the two candidates with higher GPA between the candidates that have better degree, number of publications and experience 4 MsC13 43.6 IdDegreePublicationsExperienceGPA 1 Post Dr 92 3.75 3PhD 12 23.75 Partial Scanning of database (until a seen object dominates the final object) But, BMRTKS also completely builds the Skyline

16 Page  16  TKSI (Top-K SkyIndex) Top-k Skyline Evaluation 1 3.75 33.75 IdGPA 54.5 2424 7474 4 3.6 63.5 4 MsC 5BEng IdDegree 1 Post Dr 2Post Dr 3PhD 6BEng 7BEng 5757 4 13 IdPublications 1 9 210 312 6 7575 4 5353 IdExperience 1 2 2121 3232 6363 7171 Partial Scanning of database (until k incomparable objects are found) TKSI partially builds the Skyline, and minimizes the non-necessary probes Index 1Index 2Index 3Index 4

17 Page  17Pagina  Dataset and Queries –100.000 Random data: Value Domain: Float between 0 and 1 Data Distribution: Uniform, Gaussian and Mixed –Sixty random queries. Multi-criteria dimensions range between 2-6.  Plataform –SunFire V440, OS SunOS 5.10, two processors Sparcv9 of 1.281 MHZ, 16 GB of RAM and four disks Ultra320 SCSI of 73 GB. –Java 1.5 and Oracle 9i. Experimental Study

18 Page  18Pagina  Average Skyline Size & Probes Experimental Study Data DistributionAverage Skyline Size (60 queries) Uniform2405 Gaussian2477 Mixed2539 Skyline size can be up to 2.6% of the input data! Probes BDTKSBMORTKS 23,749,79627,201,877 Probes on virtual object increase the number of probes of multi-criteria function!

19 Page  19Pagina  BDTKS and TKSI Experimental Study BDTKS executes less probes and requires less evaluation time than BMORTKS. For small k, TKSI outperforms BDTKS!

20 Page  20  TKSI builds the Skyline until it has calculated the k objects.  Our experimental results show that TKSI executed less probes and consumed less evaluation time.  In the Future, we plan to extend TKSI over Web data sources, and incorporate the TKSI into an existing DBMS. Conclusions and Future Work

21 Thanks! Q&A


Download ppt "Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,"

Similar presentations


Ads by Google