Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Common Items Extraction from Multiple Sorted Lists Wei Lu, Cuitian Rong, Jinchuan Chen, Xiaoyong Du, Gabriel Fung, Xiaofang Zhou Renmin University.

Similar presentations


Presentation on theme: "Efficient Common Items Extraction from Multiple Sorted Lists Wei Lu, Cuitian Rong, Jinchuan Chen, Xiaoyong Du, Gabriel Fung, Xiaofang Zhou Renmin University."— Presentation transcript:

1 Efficient Common Items Extraction from Multiple Sorted Lists Wei Lu, Cuitian Rong, Jinchuan Chen, Xiaoyong Du, Gabriel Fung, Xiaofang Zhou Renmin University of China 1 11 1 1 2 2 2 The University of Queensland

2 Efficient Common Items Extraction from Multiple Sorted Lists Outline Problem Statement & Motivation MergeSkip & MergeESkip Experiments

3 Efficient Common Items Extraction from Multiple Sorted Lists Problem Statement Given a set of sorted lists, supposing there are no duplicates in each list, our objective is to efficiently identify items that appear in each list.

4 Efficient Common Items Extraction from Multiple Sorted Lists Motivation Index Join –R1(X, Y1) ∞ R2(X, Y2) ∞ … ∞ Rn(X, Yn) –Where an index is created on X of each relation Information Retrieval –Identify documents that contain a given set of words –Where: documents are pre-tokenized as words an inverted list is exploited to map each word into a list of document identifiers. Existing Approach –ScanAll

5 Efficient Common Items Extraction from Multiple Sorted Lists 2 5 8 12 50 80 100 400 3 6 9 12 80 100 300 350 80 100 150 200 320 5 20 34 56 100 300 800 2 3 80 55

6 Efficient Common Items Extraction from Multiple Sorted Lists 2 5 8 12 50 80 100 400 3 6 9 12 80 100 300 350 80 100 150 200 320 5 20 34 56 100 300 800 100

7 Efficient Common Items Extraction from Multiple Sorted Lists Limitation –Each item of lists needs to be accessed before any of the lists is exhaused

8 Efficient Common Items Extraction from Multiple Sorted Lists MergeSkip Observation –Let minValue be the minimum value of each list –Let maxMinValue be the maximum value among minValues of lists –Items with values less than maxMinValue in each list cannot be the common items

9 Efficient Common Items Extraction from Multiple Sorted Lists 2 5 8 12 50 80 100 400 3 6 9 12 80 100 300 350 80 100 150 200 320 5 20 34 56 100 300 800 maxMinValue: 80

10 Efficient Common Items Extraction from Multiple Sorted Lists 2 5 8 12 50 80 100 400 3 6 9 12 80 100 300 350 80 100 150 200 320 5 20 34 56 100 300 800 How can we jump to the right position of each list? Using the binary search maxMinValue: 80

11 Efficient Common Items Extraction from Multiple Sorted Lists What will happen if lists are similar 1 3 5 7 9 11 13 15 2 4 6 8 11 13 15 17 Can binary search bring any benefit? –No 9 513 3 71115

12 Efficient Common Items Extraction from Multiple Sorted Lists Modified Binary Search The time complexity –log (k), k is the number of searched items in the list Motivation of Modified Binary Search –decrease the number of searched items, rather than the length from the current position to the end of the list –Iteratively check the item at the position current position + 2 i.

13 Efficient Common Items Extraction from Multiple Sorted Lists Current Position 2121 2 Check the item at the position, current position + 2 1. 1 3 5 7 9 11 15 17 19 2323 Else If value of the item is less than maxMinValue then item, with value 3, is accessed;

14 Efficient Common Items Extraction from Multiple Sorted Lists Limitation of MergeSkip –At each iteration, maxMinValue is not refined.

15 Efficient Common Items Extraction from Multiple Sorted Lists MergeESkip Motivation –maxMinValue will be refined at each step

16 Efficient Common Items Extraction from Multiple Sorted Lists 2 5 8 12 50 80 100 400 3 6 9 12 80 100 300 350 80 100 150 200 320 5 20 34 56 100 300 800

17 Efficient Common Items Extraction from Multiple Sorted Lists 2 5 8 12 50 80 100 400 3 6 9 12 80 100 300 350 80 100 150 200 320 5 20 34 56 100 300 800 End

18 Efficient Common Items Extraction from Multiple Sorted Lists Further Discussion of MergeESkip Which list should be the next selected list? –The performance can be different Several strategies –Selection in a Token Ring Method –Random selection –Selection by size of each list –Selection by statistical information

19 Efficient Common Items Extraction from Multiple Sorted Lists Experimental Evaluation Synthetic datasets –Normal distribution Different mean, same variance Same mean, different variance DBLP dataset –10 lists –Length of each list is from 81,000 to 150,000 Algorithms –MergeAll algorithm –MergeSkip algorithm –MergeESkip algorithm

20 Efficient Common Items Extraction from Multiple Sorted Lists synthetic dataset

21 Efficient Common Items Extraction from Multiple Sorted Lists DBLP dataset len 144288 144235 136497 106298 95418 93409 88455 83122 81632 81623

22 Efficient Common Items Extraction from Multiple Sorted Lists Effect of Different Data Distribution Parameters: the number of lists = 4; the length of each list = 1M

23 Efficient Common Items Extraction from Multiple Sorted Lists

24 Effect of the Number of lists Parameters: mean = 0; variance = 100; the length of each list = 1M

25 Efficient Common Items Extraction from Multiple Sorted Lists

26 Effect of Size of Lists Parameters: mean = 0; variance = 100; the number of lists = 4

27 Efficient Common Items Extraction from Multiple Sorted Lists Thanks!


Download ppt "Efficient Common Items Extraction from Multiple Sorted Lists Wei Lu, Cuitian Rong, Jinchuan Chen, Xiaoyong Du, Gabriel Fung, Xiaofang Zhou Renmin University."

Similar presentations


Ads by Google