Download presentation
Presentation is loading. Please wait.
Published byAlvin Halsted Modified over 9 years ago
1
Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici
Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici
2
Definitions
3
Definitions
4
Definitions TA: TB: an … a4 a3 a2 a1 bm … b4 b3 b2 b1 r(a) r(b)
A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB
5
Definitions TA: TB: an … a4 a3 a2 a1 bm … b4 b3 b2 b1 TA x TB : bm …
r=(r(a) , r(b))
6
Definitions TA x TB : Target bm … b4 b3 b2 b1 an a4 a3 a2 a1 TAB TAB
match no-match TAB TAB
7
Definitions TA x TB : Target bm … b4 b3 b2 b1 an a4 a3 a2 a1 TAB TAB
match no-match TAB TAB
8
Definitions d d1 d2 bm … b1 an a2 a1 v1 bm … b1 an a2 a1 v2
9
Definitions Ad⊆A – the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. d1 d2 d4 d5 d3 Ad4 = {a1,a2} Ad2 = {a1}
10
Running Examples
11
The data set Customer Type Customer City Request Location
Request Day Of Week Request Part Of Day Request ID private Berlin Friday Afternoon 1 Hamburg Wednesday 2 business Morning 3 Wednseday 4 Saturday 5 Thursday 6 7 8 9 10 Monday 11 12 13 Bonn 14 15 16 17 18 19
12
The data set – cont. Customer Type Customer City Request Location
Request Day Of Week Request Part Of Day Request ID private Bonn Hamburg Friday Afternoon 20 Berlin Morning 21 business 22 23 Wednseday 24 Thursday 25 26 Monday 27 28 29 30 31
13
Coarse Grained Jaccard
14
Coarse Grained Jaccard – Splitting the root of the tree
Three candidates for split: Request location Request day of week Request part of day
15
CGJ– Splitting the root of the tree
d reqLocation != Berlin reqLocation = Berlin Score1=1/23 * W1 = 16/31 + d reqLocation !=Hamburg reqLocation = Hamburg Score(SplitreqLocation) = 0.0561 Score2=2/23 * W2 = 9/31 + d reqLocation != Bonn reqLocation = Bonn Score3=1/23 * W3 = 6/31
16
CGJ– Splitting the root of the tree
d dayOfWeek!= Monday dayOfWeek= Monday Score1=3/15 * W1 = 7/31 + d dayOfWeek!= Wednesday dayOfWeek= Wednesday Score2=5/15 * W2 = 5/31 + d dayOfWeek!= Thursday dayOfWeek = Thursday Score(SplitdayOfWeek) = 0.260 Score3=3/15 * W3 = 3/31 + d dayOfWeek != Friday dayOfWeek = Friday * Score4=5/15 W4 = 9/31 + d dayOfWeek != Friday dayOfWeek = Friday * Score5=3/15 W5= 7/31
17
CGJ– Splitting the root of the tree
d partOfDay= Afternoon partOfDay= Morning Score1=4/23 Score(SplitpartOfDay) = 0.173
18
Coarse Grained Jaccard – Splitting the root of the tree
Three candidates for split: Request location Request day of week 0.260 Request part of day 0.173 The split in the root
19
Fine Grained Jaccard
20
Fine Grained Jaccard – Splitting the root of the tree
Req. Location = Berlin d Req. Location != Berlin
21
Least Probable Intersections
22
LPI – Splitting the root of the tree
Req. Location != Berlin Req. Location = Berlin d LPI – Splitting the root of the tree
23
Req. Location = Berlin Req. Location != Berlin
24
LPI – Splitting the root of the tree
Req. Location = Berlin d Req. Location != Berlin
25
Maximum Likelihood Estimation
26
MLE – Splitting the root of the tree
Request Location Berlin Bonn Hamburg Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.