Presentation is loading. Please wait.

Presentation is loading. Please wait.

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage.

Similar presentations


Presentation on theme: "OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage."— Presentation transcript:

1 OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering Ma'ayan Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici

2 Definitions

3 Definitions

4 TA:TA:TB:TB: A = {a 1,a 2,a 3,…,a n } |A| = n |T A | = num of records in T A r (a) = a record from T A B={b 1,b 2,b 3,…,b m } |B|=m |T B | = num of records in T B r (b) = a record from T B r (a) r (b)

5 Definitions TA:TA:TB:TB: bmbm …b4b4 b3b3 b2b2 b1b1 anan …a4a4 a3a3 a2a2 a1a1 T A x T B : r=(r (a), r (b) )

6 Definitions Targetbmbm …b4b4 b3b3 b2b2 b1b1 anan …a4a4 a3a3 a2a2 a1a1 match no-match T A x T B : T AB

7 Definitions Targetbmbm …b4b4 b3b3 b2b2 b1b1 anan …a4a4 a3a3 a2a2 a1a1 match no-match T A x T B : T AB

8 Definitions d a=v1 d1 a=v2 d2 bmbm …b1b1 anan …a2a2 a1a1 v1v1 v1v1 v1v1 bmbm …b1b1 anan …a2a2 a1a1 v2v2 v2v2 v2v2

9 Definitions A d4 = {a 1,a 2 } A d2 = {a 1 } A d ⊆A – the subset of attributes of T A that were already selected as splitting attributes in the path from the root of the tree to node d.

10 Running Examples

11 The data set Customer TypeCustomer CityRequest Location Request Day Of Week Request Part Of Day Request ID privateBerlin FridayAfternoon1 privateHamburg WednesdayAfternoon2 businessBerlin WednesdayMorning3 privateBerlin WednsedayMorning4 privateBerlin SaturdayAfternoon5 privateBerlin ThursdayMorning6 privateBerlin FridayAfternoon7 businessBerlin SaturdayAfternoon8 privateBerlin SaturdayAfternoon9 businessHamburg FridayAfternoon10 businessHamburg MondayAfternoon11 privateHamburg SaturdayAfternoon12 privateBerlin MondayAfternoon13 privateBonnBerlinMondayAfternoon14 privateBerlin MondayAfternoon15 privateBonn SaturdayMorning16 privateHamburg SaturdayMorning17 privateHamburg SaturdayMorning18 privateHamburg FridayAfternoon19

12 The data set – cont. Customer TypeCustomer CityRequest Location Request Day Of Week Request Part Of Day Request ID privateBonnHamburgFridayAfternoon20 privateBerlinHamburgFridayMorning21 businessBerlin FridayMorning22 privateBerlin FridayMorning23 privateBerlin WednsedayAfternoon24 privateBerlin ThursdayAfternoon25 businessBerlin ThursdayAfternoon26 businessBonn MondayAfternoon27 privateHamburgBonnMondayAfternoon28 businessBerlinBonnMondayAfternoon29 businessBonn WednsedayAfternoon30 privateBonn FridayAfternoon31

13 Coarse Grained Jaccard

14 Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: Request location Request day of week Request part of day

15 CGJ– Splitting the root of the tree d reqLocation != Berlin reqLocation = Berlin W 1 = 16/31 W 3 = 6/31 W 2 = 9/31 Score 1 =1/23 Score 3 =1/23 Score 2 =2/23 * * * + + Score(Split reqLocation ) = 0.0561 d reqLocation !=Hamburg reqLocation = Hamburg d reqLocation != Bonn reqLocation = Bonn

16 CGJ– Splitting the root of the tree d dayOfWeek!= Monday dayOfWeek= Monday W 1 = 7/31 W 3 = 3/31 W 2 = 5/31 Score 1 =3/15 Score 3 =3/15 Score 2 =5/15 * * * + + Score(Split dayOfWeek ) = 0.260 d dayOfWeek!= Wednesday dayOfWeek= Wednesday d dayOfWeek!= Thursday dayOfWeek = Thursday W 4 = 9/31Score 4 =5/15 * d dayOfWeek != Friday dayOfWeek = Friday W 5 = 7/31Score 5 =3/15 * d dayOfWeek != Friday dayOfWeek = Friday + +

17 CGJ– Splitting the root of the tree d partOfDay= Afternoon partOfDay= Morning Score 1 =4/23 Score(Split partOfDay ) = 0.173

18 Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: Request location 0.0561 Request day of week 0.260 Request part of day 0.173 The split in the root

19 Fine Grained Jaccard

20 Fine Grained Jaccard – Splitting the root of the tree Req. Location != Berlin Req. Location = Berlin d d

21 Least Probable Intersections

22 LPI – Splitting the root of the tree Req. Location != Berlin Req. Location = Berlin d d

23 Req. Location != Berlin Req. Location = Berlin

24 LPI – Splitting the root of the tree Req. Location != Berlin Req. Location = Berlin d d

25 Maximum Likelihood Estimation

26 Cust. City Cust. City Cust. Type Cust. City Cust. City Cust. Type Cust. City Cust. City Cust. Type MLE – Splitting the root of the tree p(Cust. City|Cust. Type)p(Cust. Type|Cust. City)


Download ppt "OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage."

Similar presentations


Ads by Google