Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ

Similar presentations


Presentation on theme: "Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ"— Presentation transcript:

1 Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br

2 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 2Classification  Technique that associates samples to classes previously known.  May be Crisp or Fuzzy  Supervised  MLP  trained  Non supervised  K-NN e fuzzy K-NN  not trained

3 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 3 K-NN and Fuzzy K-NN  Classification Methods  Classes identified by patterns  Classifies by the k nearest neighbours  Previous knowledge about the problem classes  It is not restricted to a specific distribution of the samples

4 Classification Crisp K-NN

5 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 5 Crisp K-NN  Supervised clustering method (Classification method).  Classes are defined before hand.  Classes are characterized by sets of elements.  The number of elements may differ among classes.  The main idea is to associate the sample to the class containing more neighbours.

6 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 6 Crisp K-NN w2w2w2w2 w1w1w1w1 w3w3w3w3 w 13 w 10 w9w9w9w9 w4w4w4w4 w5w5w5w5 w 14 w 11 w 12 w7w7w7w7 w8w8w8w8 w6w6w6w6 s Class 1 Class 2 Class 3 Class 4 Class 5  3 nearest neighbours, and sample s is closest to pattern w 6 on class 5.

7 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 7 Crisp K-NN  Consider W={w 1, w 2,..., w t } a set of t labelled data.  Each object w i is defined by l characteristics w i =(w i1, w i2,..., w il ).  Input of y unclassified elements.  k the number of closest neighbours of y.  E the set of k nearest neighbours (NN).

8 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 8 Crisp K-NN  Let t be the number of elements that identify the classes.  Let c be the number of classes.  Let W be the set that contain the t elements  Each cluster is represented by a subset of elements from W.

9 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 9 Crisp K-NN algorithm set k {Calculating the NN} for i = 1 to t Calculate distance from y to x i if i<=k then add x i to E else if x i is closer to y than any previous NN then delete the farthest neighbour and include x i in the set E

10 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 10 Crisp K-NN algorithm cont. Determine the majority class represented in the set E and include y in this class. if there is a draw, then calculate the sum of distances from y to all neighbours in each class in the draw if the sums are different then add x i to class with smallest sum else add x i to class where last minimum was found

11 Classification Fuzzy K-NN

12 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 12 Fuzzy K-NN  The basis of the algorithm is to assign membership as a function of the object’s distance from its K-nearest neighbours and the memberships in the possible classes.  J. Keller, M. Gray, J. Givens. A Fuzzy K-Nearest Neighbor Algorithm. IEEE Transactions on Systems, Man and Cybernectics, vol smc-15, no 4, July August 1985

13 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 13 Fuzzy K-NN w1w1 w2w2 w3w3 w4w4 w 13 w 10 w9w9 w 14 w5w5 w8w8 w 12 w 11 w6w6 w7w7  Classe 1 Classe 2 Classe 3 Classe 4Classe 5

14 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 14 Fuzzy K-NN  Consider W={w 1, w 2,..., w t } a set of t labelled data.  Each object w i is defined by l characteristics w i =(w i1, w i2,..., w il ).  Input of y unclassified elements.  k the number of closest neighbours of y.  E the set of k nearest neighbours (NN).   i (y) is the membership of y in the class i   ij is the membership in the ith class of the jth vector of the labelled set (labelled w j in class i).

15 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 15 Fuzzy K-NN  Let t be the number of elements that identify the classes.  Let c be the number of classes.  Let W be the set that contain the t elements  Each cluster is represented by a subset of elements from W.

16 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 16 Fuzzy K-NN algorithm set k {Calculating the NN} for i = 1 to t Calculate distance from y to x i if i<=k then add x i to E else if x i is closer to y than any previous NN then delete the farthest neighbour and include x i in the set E

17 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 17 Fuzzy K-NN algorithm Calculate  i (y) using for i = 1 to c // number of classes

18 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 18 Computing  ij (y)   ij (y) can be assigned class membership in several ways.  They can be given complete membership in their known class and non membership in all other.  Assign membership based on distance from their class mean.  Assign membership based on the distance from labelled samples of their own class and those from other classes.

19 Classification ICC-KNN System

20 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 20 ICC-KNN System  Non-Parametric Statistical Pattern Recognition System  Associates FCM, fuzzy KNN and ICC  Evaluates data disposed on several class formats

21 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 21 ICC-KNN System  Divided in two modules  First module (training)  chooses the best patterns to use with K-NN  chooses the best fuzzy constant and best number of neighbours (K)  Second module (classification)  uses fuzzy k-nn to classify

22 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 22 ICC-KNN First Module  Classification Module  Finds structure on data sample  Divided into two phases  First phase of training  Finds the best patterns for fuzzy K-NN FCM – Applied to each class using many numbers of categoriesFCM – Applied to each class using many numbers of categories ICC – Finds the best number of categories to represent each classICC – Finds the best number of categories to represent each class

23 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 23 ICC-KNN First Phase  Results of applying FCM and ICC  Patterns for K-NN which are the centres of the chosen run of FCM  Number of centres which are all the centres of the number of categories resulting after applying ICC to all FCM runs

24 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 24 ICC-KNN Second Phase  Second phase of training  Evaluates the best fuzzy constant and the best number of neighbours so to achieve best performance on the K-NN  tests several m and k values  finds m and k for the maximum rate of crisp hits

25 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 25ICC-KNN  Pattern Recognition Module  Distributes each data to its class  Uses the chosen patterns, m and k to classify data

26 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 26 ICC-KNN block diagram Class 1 Class s FCM ICC Fuzzy K-NN m k W, U w W U w w1w1 wsws U 1cmin U 1cmáx U Scmin U Scmáx Fuzzy K-NN Classification Module Pattern Recognition Module Not classified Data

27 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 27ICC-KNN  Let R={r 1,r 2,...,r n } be the set of samples.  Each sample r i belongs to one of s known classes.  Let U ic be the inclusion matrix for the class i with c categories.  Let V ic be the centre matrix for the class i with c categories.  Let w i be equal to the best V ic of each class  Let W be the set of sets of centres w i

28 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 28 ICC-KNN algorithm  Classification Module  First phase of training  Step 1. Set m  Step 2. Set cmin and cmáx  Step 3. For each s known class Generate the set Rs with the points from R belonging to the class s For each category c in the interval [cmin, cmáx] Run FCM for c and the set Rs generating Usc and Vsc Calculate ICC for Rs e Usc End Define the patterns ws of class s as the matrix Vsc that maximizes ICC  Step 4. Generate the set W = {w1,..., ws}

29 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 29 ICC-KNN algorithm  Second phase of Training  Step 5. Set mmin e mmáx  Step 6. Set kmin e kmáx For each m from [mmin, mmáx] For each m from [mmin, mmáx] For each k from [kmin, kmáx] For each k from [kmin, kmáx] Run fuzzy K-NN for the patterns from the set W generating Umk Calculate the number of crisp hits for Umk  Step 7. Choose m and k that yields the best crips hit figures  Step 8. if there is a draw If the k’s are different Choose the smaller k else Choose the smaller m

30 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 30 ICC-KNN algorithm  Pattern Recognition Module  Step 9. Apply fuzzy K-NN using patterns form the set W and the chosen parameters m and k to the data to be classified.

31 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 31 ICC-KNN results

32 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 32 ICC-NN results  2000 samples, 4 classes, 500 samples in each class  Classes 1 and 4 – concave classes  Classes 2 and 3 – convex classes, elliptic format

33 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 33 ICC-KNN results First phase of training  FCM applied to each class  Training data 80%  400 samples from each class  c = 3..7 and m = 1,25  ICC applied to results  Classes 1 and 4  4 categories  Classes 2 and 3  3 categories

34 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 34 ICC-KNN results Second phase of Training  Running fuzzy K-NN  Patterns from first phase  Random patterns  k = 3 a 7 neighbours  m = {1,1; 1,25; 1,5; 2}

35 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 35 ICC-KNN results Conclusão:K-NN é mais estável em relação ao valor de m para os padrões da PFT

36 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 36 ICC-KNN results  Training data  Lines  classes  Columns  classification  m = 1,5 e k = 3  96,25%  m = 1,1 e k = 3  79,13% (random patterns) 34914643972104 733240324376003 103801970379142 12106621320103881 43214321 Padrões AleatóriosPadrões da PFT Classes Dados de Treinamento

37 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 37 ICC-KNN results  Test data  Lines  classes  Columns  classification  Pad. PFT – 94,75% Pad. Aleat – 79% 850150991004 1882001090003 00964309342 2002753102971 43214321 Padrões AleatóriosPadrões da PFT Classes Dados de Testes

38 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 38 ICC-KNN x Others  FCM, FKCN, GG e GK  Fase de Treinamento (FTr)  Dados de treinamento  c = 4 e m = {1,1; 1,25; 1,5; 2}  Associar as categorias às classes Critério do somatório dos graus de inclusãoCritério do somatório dos graus de inclusão oCálculo do somatório dos graus de inclusão dos pontos de cada classe em cada categoria oUma classe pode ser representada por mais de uma categoria

39 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 39 ICC-KNN x Others  Fase de Teste  Dados de Teste  Inicialização dos métodos com os centros da FTr  Calcula o grau de inclusão dos pontos em cada categoria  Classe representada por mais de 1 categoria  Grau de inclusão = soma dos graus de inclusão dos pontos nas categorias que representam a classe

40 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 40 ICC-KNN x Others  GK para m = 2  84%  FCM e FKCN  66% para m = 1,1 e m = 1,25  GG-FCM  69% para m = 1,1 e 1,25  GG Aleatório  57,75% para m = 1,1 e 25% para m = 1,5 18,14s22,66s2,59s2,91s 23,11s 36,5s T 89,5%69%70,75% 83% 95,75% N 84%69%66% 79% 94,75% R GKGGFKCNFCM KNN A. ICC-KNN

41 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 41GK

42 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 42GK Classes GK 1234 1 776017 2 69400 3 00973 4 003268

43 Classification KNN+Fuzzy Cmeans System

44 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 44 KNN+Fuzzy C-Means algorithm  The idea is an two-layer clustering algorithm  First an unsupervised tracking of cluster centres is made using K-NN rules  The second layer involves one iteration of the fuzzy c-means to compute the membership degrees and the new fuzzy centres.  Ref. N. Zahit et all, Fuzzy Sets and Systems 120 (2001) 239-247

45 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 45 First Layer (K-NN)  Let X={x 1,…,x n } be a set of n unlabelled objects.  c is the number of clusters.  The first layer consists of partitioning X into c cells using the fist part of K-NN.  Each cell i is (1<=i<=c) represented as E i (y i, K-NN of y i, G i )  G i is the center of cell E i and defined as

46 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 46 KNN-1FCMA settings  Let X={x 1,…,x n } be a set of n unlabelled objects.  Fix c the number of clusters.  Choose m>1 (nebulisation factor).  Set k = Integer(n/c –1).  Let I={1,2,…,n} be the set of all indexes of X.

47 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 47 KNN-1FCMA algorithm step 1 Calculate G 0 for i = 1 to c Search in I for the index of the farthest object y i from G i-1 For j = 1 to n Calculate distance from y i to x j Calculate distance from y i to x j if j <= k then add x j to E i else if x i is closer to y than any previous NN then delete the farthest neighbour and include x i in the set E i

48 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 48 KNN-1FCMA algorithm cont. Include y i in the set E i. Calculate G i. Delete y i index and the K-NN indexes of y i from I. if I   then for each remaining object x determine the minimum distance to any centre G i of E i. classify x to the nearest centre. update all centres.

49 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 49 KNN-1FCMA algorithm step2  Compute the matrix U according to  Calculate all fuzzy centres using

50 *@2001 Adriano Cruz *NCE e IM - UFRJ Classification 50 Results KNN-1FCMA 1217163100IRIS 1013142150IRIS23 1913 6120S4 1002480S3 801360S2 1100220S1 FCMAKNN-1FCMAFCMA Number of Iterations avg Misclassification rate cElemData


Download ppt "Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ"

Similar presentations


Ads by Google