# Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici

## Presentation on theme: "Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici"— Presentation transcript:

Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici
Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici

Definitions

Definitions

Definitions TA: TB: an … a4 a3 a2 a1 bm … b4 b3 b2 b1 r(a) r(b)
A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB

Definitions TA: TB: an … a4 a3 a2 a1 bm … b4 b3 b2 b1 TA x TB : bm …
r=(r(a) , r(b))

Definitions TA x TB : Target bm … b4 b3 b2 b1 an a4 a3 a2 a1 TAB TAB
match no-match TAB TAB

Definitions TA x TB : Target bm … b4 b3 b2 b1 an a4 a3 a2 a1 TAB TAB
match no-match TAB TAB

Definitions d d1 d2 bm b1 an a2 a1 v1 bm b1 an a2 a1 v2

Definitions Ad⊆A – the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. d1 d2 d4 d5 d3 Ad4 = {a1,a2} Ad2 = {a1}

Running Examples

The data set Customer Type Customer City Request Location
Request Day Of Week Request Part Of Day Request ID private Berlin Friday Afternoon 1 Hamburg Wednesday 2 business Morning 3 Wednseday 4 Saturday 5 Thursday 6 7 8 9 10 Monday 11 12 13 Bonn 14 15 16 17 18 19

The data set – cont. Customer Type Customer City Request Location
Request Day Of Week Request Part Of Day Request ID private Bonn Hamburg Friday Afternoon 20 Berlin Morning 21 business 22 23 Wednseday 24 Thursday 25 26 Monday 27 28 29 30 31

Coarse Grained Jaccard

Coarse Grained Jaccard – Splitting the root of the tree
Three candidates for split: Request location Request day of week Request part of day

CGJ– Splitting the root of the tree
d reqLocation != Berlin reqLocation = Berlin Score1=1/23 * W1 = 16/31 + d reqLocation !=Hamburg reqLocation = Hamburg Score(SplitreqLocation) = 0.0561 Score2=2/23 * W2 = 9/31 + d reqLocation != Bonn reqLocation = Bonn Score3=1/23 * W3 = 6/31

CGJ– Splitting the root of the tree
d dayOfWeek!= Monday dayOfWeek= Monday Score1=3/15 * W1 = 7/31 + d dayOfWeek!= Wednesday dayOfWeek= Wednesday Score2=5/15 * W2 = 5/31 + d dayOfWeek!= Thursday dayOfWeek = Thursday Score(SplitdayOfWeek) = 0.260 Score3=3/15 * W3 = 3/31 + d dayOfWeek != Friday dayOfWeek = Friday * Score4=5/15 W4 = 9/31 + d dayOfWeek != Friday dayOfWeek = Friday * Score5=3/15 W5= 7/31

CGJ– Splitting the root of the tree
d partOfDay= Afternoon partOfDay= Morning Score1=4/23 Score(SplitpartOfDay) = 0.173

Coarse Grained Jaccard – Splitting the root of the tree
Three candidates for split: Request location Request day of week 0.260 Request part of day 0.173 The split in the root

Fine Grained Jaccard

Fine Grained Jaccard – Splitting the root of the tree
Req. Location = Berlin d Req. Location != Berlin

Least Probable Intersections

LPI – Splitting the root of the tree
Req. Location != Berlin Req. Location = Berlin d LPI – Splitting the root of the tree

Req. Location = Berlin Req. Location != Berlin

LPI – Splitting the root of the tree
Req. Location = Berlin d Req. Location != Berlin

Maximum Likelihood Estimation

MLE – Splitting the root of the tree
Request Location Berlin Bonn Hamburg Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)