Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.

Similar presentations


Presentation on theme: "1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project."— Presentation transcript:

1 1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project

2 2 Introduction  A typical control system consists of four basic elements: Dynamic plant Dynamic plant Controllers Controllers Actuators Actuators Sensors Sensors  Any kind of malfunction in these components can result in unacceptable anomaly in overall system performance.  They are referred to as fault in a control system. The Objective of fault detection and identification is to detect, isolate and identify these faults so that system performance can be recovered  Condition Based Maintenance (CBM) is the process of executing repairs when objective evidence indicates need for such actions or in other words when anomalies or faults are detected in a control system.

3 3 Motivation  Model based CBM can be applied when we have a mathematical model of the system to be monitored.  When CBM needs to be performed based on just the data available from sensors, data driven methodologies are utilized for this purpose.  SOM is widely used in data mining as a tool for exploration and analysis of large amounts of data.  It can be used for data reduction or vector quantization so that we can analyze the system data for anomalies by using only the data clusters formed from the trained map instead of the large initial data sets.

4 4 ii’ Output layer Input layer V v Competitive Learning  Assume a sequence of input samples v(t) in d-dimensional input space and a lattice of N neurons, labeled i = 1,2,……..,N and with the corresponding weight vectors w i (t) =[w ij (t)]. w i (t) =[w ij (t)].  If v(t) can be simultaneously compared with each weight vector of the lattice then the best matching weight, for example w i* can be determined and updated to match or even better the current input.  As a result of the competitive learning, different weights will become tuned to different regions in the input space.

5 5 Self Organizing Maps  SOM is an unsupervised neural network technique which finds application in: Density estimation e.g. clustering or classification purposes Density estimation e.g. clustering or classification purposes Blind Source Separation Blind Source Separation Visualization of data sets Visualization of data sets  It projects the input space on prototypes of low-dimensional regular grid that can be effectively utilized to visualize and explore properties of data.  The SOM consists of a regular, two dimensional grid of map units (neurons).  Each unit i is represented by a prototype vector w i (t) = [w i1 (t), …, w id (t)]; where d is input vector dimension w i (t) = [w i1 (t), …, w id (t)]; where d is input vector dimension

6 6 Self Organizing Maps (Algorithm)  Given a data set the number of map units (neurons) is first chosen.  The Map units can be selected to be approximately equal to √N to 5√N, where N is the number of data samples in the given data set.  The SOM is trained iteratively. At each training step, a sample vector v is randomly chosen from the input data set.  Distances between v and all prototype vectors are computed. The Best Matching Unit (BMU) or the winner, which is denoted here by b is the map unit with prototype closest to v. ||v – w b || = min i {||v – w i ||} ||v – w b || = min i {||v – w i ||}

7 7 Self Organizing Maps  The BMU or winner and its topological neighbors are moved closer to the input vector in the input space w i (t+1) = w i (t) + α(t)*h bi (t)*[v-w i (t)] w i (t+1) = w i (t) + α(t)*h bi (t)*[v-w i (t)] t time t time α(t) adaptation coefficient α(t) adaptation coefficient h bi (t) neighborhood kernel centered on winner unit h bi (t) neighborhood kernel centered on winner unit where r b and r i are positions of neurons b and i on the SOM grid. Both α (t) and σ(t) decrease monotonically with time. where r b and r i are positions of neurons b and i on the SOM grid. Both α (t) and σ(t) decrease monotonically with time.

8 8 Clustering- The Two Level Approach  Group the input data into clusters where the data is grouped into same cluster if it’s similar to one another.  A widely adopted definition of clustering is a partitioning that minimizes the distances within and maximizes the distance between clusters.  Once the neurons are trained the next step is clustering of SOM. For clustering of SOM the two level approach is followed. First a large set of neurons much larger than the expected number of clusters is formed using the SOM. First a large set of neurons much larger than the expected number of clusters is formed using the SOM. The Neurons in the next step are combined to form the actual clusters using the k-means clustering technique. The Neurons in the next step are combined to form the actual clusters using the k-means clustering technique.  The number of clusters is K and number of Neurons is M. K<<M<<N K<<M<<N

9 9 M Prototypes of the SOM N Samples K Clusters Level 1Level 2 K < M < N Two level Approach Data Samples

10 10 Advantage of using the two level approach  The primary benefit is the reduction of computational cost. Even with relatively small number of data samples many clustering algorithms become intractably heavy.  For e.g. By using two level approach the reduction of computational load is about √N /15 or about six fold for N=10,000 from the direct clustering of data using k-means  Another benefit is noise reduction. The prototypes are local averages of data and therefore less sensitive to random variations than the original data.

11 11 K-means decision on number of clusters  K-means algorithm was used at level 2 for clustering of the trained SOM neurons.  K-means algorithm clusters the given data into k clusters where we define k.  To decide the value of k one method is to run the algorithm from k=2 to k= √N where N is the number of data samples.  K-means algorithm minimizes the error function: Where C is the number of clusters and c k is the center of cluster k. Where C is the number of clusters and c k is the center of cluster k.  The approach followed in this project was to pick the number of clusters as the value k which makes the error E 0.10E’ to 0.15*E’ or 10 to 15% of E’ where E’ is the error when k=2

12 12 Selection of number of clusters

13 13 Reference Distance Calculation  The clusters thus formed using training/nominal data sets are used to calculate the reference distance (dRef).  Knowing the cluster centers calculated from the k-means algorithm and the prototypes/Neurons which formed a particular cluster, we calculate the reference distance for each cluster.  Reference distance specific to a particular cluster is equal to the distance between the cluster center and the prototype/Neuron belonging to this cluster that is at the maximum distance from this cluster center.  Similarly the reference distance for each of the clusters formed from the nominal data set is calculated and serves as a base for fault detection.  To classify the given data cluster as nominal or faulty this underlying structure of the initial known nominal data set is used.

14 14 Fault Identification  The assumption made in this case is that the nominal data sets available which are used to form the underlying cluster structure spans the space of all information that is not faulty.  The same procedure is then repeated for the unknown data sets (no idea if nominal or in fault) i.e. first the N data points are reduced to a mapping of M neurons and then clustered using k-means algorithm.  Now taking the training data clusters as centers and knowing the reference distance for each cluster, we see if the clusters from the unknown data set are a member of the region spanned by the radius equal to the specific reference distance for that training cluster.  Any unknown data set cluster which is not a part of the region spanned by taking the training data cluster as center and radius equal to reference distance for that cluster is termed as faulty.

15 15 Block Diagram Training Data Clustering Reference Distance Unknown Data Clustering Distance Deviation Fault Identification N samples M Neurons K clusters M Neurons K<<M<<N Mapping Algorithm

16 16 SOM Using Nominal data

17 17 Clustering of nominal data

18 18 SOM using Unknown data

19 19 Clustering of unknown data

20 20 Fault Identification

21 21 SOM Performance

22 22 Equiprobable Maps  For Self Organizing feature maps the weight density at convergence is not a linear function of the input density p(v) and hence the neurons of the map will not be active with equal probabilities (i.e. the map is not equiprobabilistic).  For a discrete lattice of neurons, the weight density will be proportional to:  Regardless of the type of neighborhood function used SOM tends to undersample the high probability regions and oversample the low probability regions

23 23 Avoiding Dead Units  SOM algorithm converges to a mapping which yields neurons that are never active (“dead units”).  These units do not contribute to the minimization of the overall (MSE) distortion of the map.  To produce maps in which the neurons would have an equal probability to be active (Equiprobabilistic Maps) the idea of adding conscience to the winning neuron was introduced.  Techniques of generating Equiprobabilistic Maps discussed are: Conscience Learning Conscience Learning Frequency Sensitive Competitive Learning (FSCL) Frequency Sensitive Competitive Learning (FSCL)

24 24 Conscience Learning  When a neural network is trained with unsupervised competitive learning on a set of input vectors that are clustered into K groups/clusters then a given input vector v will activate neuron i* that has been sensitized to the cluster containing the input vector.  However if some region in the input space is sampled more frequently than the others, then a single unit begins to win all competitions for this region.  To counter this defect, one records for each neuron i the frequency with which it has won competition in the past, c i, and adds this quantity to the Euclidean distance between the weight vector w i and the current input v.

25 25 Conscience Learning  In Conscience learning two stages are distinguished First the winning neuron is determined out of the N units: First the winning neuron is determined out of the N units: i* = i* = Second, the winning neuron i* is not necessarily the one that will have its weight vectors updated. Second, the winning neuron i* is not necessarily the one that will have its weight vectors updated.  Which neurons need to be updated depends on an additional term for each unit, which is related to the number of times the unit has won the competition in recent past.  The rule of update is that for each neuron the number of times it has won the competition is recorded and a scaled version of this quantity is added to the distance metric used in minimum Euclidean distance rule

26 26 Conscience Learning  Update rule  With c i the number of times neuron i has won the competition, and C the scaling factor (“Conscience Factor”).  After determining the winning neuron i* it’s conscience is incremented  The weight of the winning neuron is updated using where is the learning rate and its value is equal to small positive constant where is the learning rate and its value is equal to small positive constant

27 27 Conscience learning using nominal data

28 28 Clusters shown with neurons

29 29 Clustering of nominal data

30 30 Conscience learning on new data set

31 31 Clustering of unknown data set

32 32 Clusters represented on data set

33 33 Fault Identification

34 34 Conscience learning Performance

35 35 Frequency Sensitive Competitive Learning  Another competitive learning scheme which is used is the Frequency Sensitive Competitive Learning  This learning scheme keeps a record of the total number of times each neuron has won the competition during training.  The distance metric in the Euclidean distance rule is then scaled as follows:  After selection of winning neuron its conscience is then incremented and the weight vector updated using the UCL rule:

36 36 FSCL using Nominal data

37 37 Clustering of nominal data

38 38 Clusters shown with neurons

39 39 FSCL using the unknown data set

40 40 Clustering using unknown data

41 41 Clusters of unknown data

42 42 Fault Identification

43 43 FSCL performance

44 44 Conclusions  As shown in the results the performance of SOM algorithm was not as good as compared to the CLT and FSCL approaches.  As SOM produces dead units even if the neighborhood function converges slowly so it was not able to train the neurons well according to the available data sets.  Due to undersampling of high probability regions, SOM was able to detect only two faulty clusters out of the four and thus its performance was not good.  Using CLT and FSCL approach all four faulty clusters were detected using the reference distance as the distance measure.  Thus the equiprobable maps perform much better than the SOM by avoiding the dead units and training the neurons by assigning a conscience with the winning neuron

45 45 References  Marc M. Van Hulle, Faithful Representations and Topographic maps ‘From Distortion to Information based Self Organization’, John Willey & sons 2000  T. Kohonen, Self Organizing Maps, Springer 1997  Anil K. Jain and Richard C. Dubes, Algorithms for Clustering Data, Prentice Hall 1988  Juha Vesanto and Esa Alhoniemi, “Clustering of the Self Organizing Map”, IEEE Trans. Neural Networks, Vol 11, No 3, May 2000

46 46 Questions/Comments ?


Download ppt "1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project."

Similar presentations


Ads by Google