Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici

Similar presentations


Presentation on theme: "Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici"— Presentation transcript:

1 Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici
Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici

2 Definitions

3 Definitions

4 Definitions TA: TB: an … a4 a3 a2 a1 bm … b4 b3 b2 b1 r(a) r(b)
A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB

5 Definitions TA: TB: an … a4 a3 a2 a1 bm … b4 b3 b2 b1 TA x TB : bm …
r=(r(a) , r(b))

6 Definitions TA x TB : Target bm … b4 b3 b2 b1 an a4 a3 a2 a1 TAB TAB
match no-match TAB TAB

7 Definitions TA x TB : Target bm … b4 b3 b2 b1 an a4 a3 a2 a1 TAB TAB
match no-match TAB TAB

8 Definitions d d1 d2 bm b1 an a2 a1 v1 bm b1 an a2 a1 v2

9 Definitions Ad⊆A – the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. d1 d2 d4 d5 d3 Ad4 = {a1,a2} Ad2 = {a1}

10 Running Examples

11 The data set Customer Type Customer City Request Location
Request Day Of Week Request Part Of Day Request ID private Berlin Friday Afternoon 1 Hamburg Wednesday 2 business Morning 3 Wednseday 4 Saturday 5 Thursday 6 7 8 9 10 Monday 11 12 13 Bonn 14 15 16 17 18 19

12 The data set – cont. Customer Type Customer City Request Location
Request Day Of Week Request Part Of Day Request ID private Bonn Hamburg Friday Afternoon 20 Berlin Morning 21 business 22 23 Wednseday 24 Thursday 25 26 Monday 27 28 29 30 31

13 Coarse Grained Jaccard

14 Coarse Grained Jaccard – Splitting the root of the tree
Three candidates for split: Request location Request day of week Request part of day

15 CGJ– Splitting the root of the tree
d reqLocation != Berlin reqLocation = Berlin Score1=1/23 * W1 = 16/31 + d reqLocation !=Hamburg reqLocation = Hamburg Score(SplitreqLocation) = 0.0561 Score2=2/23 * W2 = 9/31 + d reqLocation != Bonn reqLocation = Bonn Score3=1/23 * W3 = 6/31

16 CGJ– Splitting the root of the tree
d dayOfWeek!= Monday dayOfWeek= Monday Score1=3/15 * W1 = 7/31 + d dayOfWeek!= Wednesday dayOfWeek= Wednesday Score2=5/15 * W2 = 5/31 + d dayOfWeek!= Thursday dayOfWeek = Thursday Score(SplitdayOfWeek) = 0.260 Score3=3/15 * W3 = 3/31 + d dayOfWeek != Friday dayOfWeek = Friday * Score4=5/15 W4 = 9/31 + d dayOfWeek != Friday dayOfWeek = Friday * Score5=3/15 W5= 7/31

17 CGJ– Splitting the root of the tree
d partOfDay= Afternoon partOfDay= Morning Score1=4/23 Score(SplitpartOfDay) = 0.173

18 Coarse Grained Jaccard – Splitting the root of the tree
Three candidates for split: Request location Request day of week 0.260 Request part of day 0.173 The split in the root

19 Fine Grained Jaccard

20 Fine Grained Jaccard – Splitting the root of the tree
Req. Location = Berlin d Req. Location != Berlin

21 Least Probable Intersections

22 LPI – Splitting the root of the tree
Req. Location != Berlin Req. Location = Berlin d LPI – Splitting the root of the tree

23 Req. Location = Berlin Req. Location != Berlin

24 LPI – Splitting the root of the tree
Req. Location = Berlin d Req. Location != Berlin

25 Maximum Likelihood Estimation

26 MLE – Splitting the root of the tree
Request Location Berlin Bonn Hamburg Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)


Download ppt "Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici"

Similar presentations


Ads by Google