1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
K Means Clustering , Nearest Cluster and Gaussian Mixture
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
Self Organization: Competitive Learning
DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
X0 xn w0 wn o Threshold units SOM.
Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Neural Networks Lecture 17: Self-Organizing Maps
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
Radial Basis Function Networks
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Radial Basis Function Networks
Project reminder Deadline: Monday :00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday during.
KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organizing Maps (SOM) Unsupervised Learning.
Self Organized Map (SOM)
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
A two-stage approach for multi- objective decision making with applications to system reliability optimization Zhaojun Li, Haitao Liao, David W. Coit Reliability.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.
Soft Computing Lecture 14 Clustering and model ART.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rules Rijksuniversiteit Groningen Mathematics and Computing.
Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
381 Self Organization Map Learning without Examples.
CUNY Graduate Center December 15 Erdal Kose. Outlines Define SOMs Application Areas Structure Of SOMs (Basic Algorithm) Learning Algorithm Simulation.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Soft Computing Lecture 15 Constructive learning algorithms. Network of Hamming.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Big data classification using neural network
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Data Mining, Neural Network and Genetic Programming
Self organizing networks
K Nearest Neighbor Classification
Unsupervised learning
Lecture 22 Clustering (3).
Competitive Networks.
Competitive Networks.
Introduction to Cluster Analysis
Feature mapping: Self-organizing Maps
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
Presentation transcript:

1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project

2 Introduction  A typical control system consists of four basic elements: Dynamic plant Dynamic plant Controllers Controllers Actuators Actuators Sensors Sensors  Any kind of malfunction in these components can result in unacceptable anomaly in overall system performance.  They are referred to as fault in a control system. The Objective of fault detection and identification is to detect, isolate and identify these faults so that system performance can be recovered  Condition Based Maintenance (CBM) is the process of executing repairs when objective evidence indicates need for such actions or in other words when anomalies or faults are detected in a control system.

3 Motivation  Model based CBM can be applied when we have a mathematical model of the system to be monitored.  When CBM needs to be performed based on just the data available from sensors, data driven methodologies are utilized for this purpose.  SOM is widely used in data mining as a tool for exploration and analysis of large amounts of data.  It can be used for data reduction or vector quantization so that we can analyze the system data for anomalies by using only the data clusters formed from the trained map instead of the large initial data sets.

4 ii’ Output layer Input layer V v Competitive Learning  Assume a sequence of input samples v(t) in d-dimensional input space and a lattice of N neurons, labeled i = 1,2,……..,N and with the corresponding weight vectors w i (t) =[w ij (t)]. w i (t) =[w ij (t)].  If v(t) can be simultaneously compared with each weight vector of the lattice then the best matching weight, for example w i* can be determined and updated to match or even better the current input.  As a result of the competitive learning, different weights will become tuned to different regions in the input space.

5 Self Organizing Maps  SOM is an unsupervised neural network technique which finds application in: Density estimation e.g. clustering or classification purposes Density estimation e.g. clustering or classification purposes Blind Source Separation Blind Source Separation Visualization of data sets Visualization of data sets  It projects the input space on prototypes of low-dimensional regular grid that can be effectively utilized to visualize and explore properties of data.  The SOM consists of a regular, two dimensional grid of map units (neurons).  Each unit i is represented by a prototype vector w i (t) = [w i1 (t), …, w id (t)]; where d is input vector dimension w i (t) = [w i1 (t), …, w id (t)]; where d is input vector dimension

6 Self Organizing Maps (Algorithm)  Given a data set the number of map units (neurons) is first chosen.  The Map units can be selected to be approximately equal to √N to 5√N, where N is the number of data samples in the given data set.  The SOM is trained iteratively. At each training step, a sample vector v is randomly chosen from the input data set.  Distances between v and all prototype vectors are computed. The Best Matching Unit (BMU) or the winner, which is denoted here by b is the map unit with prototype closest to v. ||v – w b || = min i {||v – w i ||} ||v – w b || = min i {||v – w i ||}

7 Self Organizing Maps  The BMU or winner and its topological neighbors are moved closer to the input vector in the input space w i (t+1) = w i (t) + α(t)*h bi (t)*[v-w i (t)] w i (t+1) = w i (t) + α(t)*h bi (t)*[v-w i (t)] t time t time α(t) adaptation coefficient α(t) adaptation coefficient h bi (t) neighborhood kernel centered on winner unit h bi (t) neighborhood kernel centered on winner unit where r b and r i are positions of neurons b and i on the SOM grid. Both α (t) and σ(t) decrease monotonically with time. where r b and r i are positions of neurons b and i on the SOM grid. Both α (t) and σ(t) decrease monotonically with time.

8 Clustering- The Two Level Approach  Group the input data into clusters where the data is grouped into same cluster if it’s similar to one another.  A widely adopted definition of clustering is a partitioning that minimizes the distances within and maximizes the distance between clusters.  Once the neurons are trained the next step is clustering of SOM. For clustering of SOM the two level approach is followed. First a large set of neurons much larger than the expected number of clusters is formed using the SOM. First a large set of neurons much larger than the expected number of clusters is formed using the SOM. The Neurons in the next step are combined to form the actual clusters using the k-means clustering technique. The Neurons in the next step are combined to form the actual clusters using the k-means clustering technique.  The number of clusters is K and number of Neurons is M. K<<M<<N K<<M<<N

9 M Prototypes of the SOM N Samples K Clusters Level 1Level 2 K < M < N Two level Approach Data Samples

10 Advantage of using the two level approach  The primary benefit is the reduction of computational cost. Even with relatively small number of data samples many clustering algorithms become intractably heavy.  For e.g. By using two level approach the reduction of computational load is about √N /15 or about six fold for N=10,000 from the direct clustering of data using k-means  Another benefit is noise reduction. The prototypes are local averages of data and therefore less sensitive to random variations than the original data.

11 K-means decision on number of clusters  K-means algorithm was used at level 2 for clustering of the trained SOM neurons.  K-means algorithm clusters the given data into k clusters where we define k.  To decide the value of k one method is to run the algorithm from k=2 to k= √N where N is the number of data samples.  K-means algorithm minimizes the error function: Where C is the number of clusters and c k is the center of cluster k. Where C is the number of clusters and c k is the center of cluster k.  The approach followed in this project was to pick the number of clusters as the value k which makes the error E 0.10E’ to 0.15*E’ or 10 to 15% of E’ where E’ is the error when k=2

12 Selection of number of clusters

13 Reference Distance Calculation  The clusters thus formed using training/nominal data sets are used to calculate the reference distance (dRef).  Knowing the cluster centers calculated from the k-means algorithm and the prototypes/Neurons which formed a particular cluster, we calculate the reference distance for each cluster.  Reference distance specific to a particular cluster is equal to the distance between the cluster center and the prototype/Neuron belonging to this cluster that is at the maximum distance from this cluster center.  Similarly the reference distance for each of the clusters formed from the nominal data set is calculated and serves as a base for fault detection.  To classify the given data cluster as nominal or faulty this underlying structure of the initial known nominal data set is used.

14 Fault Identification  The assumption made in this case is that the nominal data sets available which are used to form the underlying cluster structure spans the space of all information that is not faulty.  The same procedure is then repeated for the unknown data sets (no idea if nominal or in fault) i.e. first the N data points are reduced to a mapping of M neurons and then clustered using k-means algorithm.  Now taking the training data clusters as centers and knowing the reference distance for each cluster, we see if the clusters from the unknown data set are a member of the region spanned by the radius equal to the specific reference distance for that training cluster.  Any unknown data set cluster which is not a part of the region spanned by taking the training data cluster as center and radius equal to reference distance for that cluster is termed as faulty.

15 Block Diagram Training Data Clustering Reference Distance Unknown Data Clustering Distance Deviation Fault Identification N samples M Neurons K clusters M Neurons K<<M<<N Mapping Algorithm

16 SOM Using Nominal data

17 Clustering of nominal data

18 SOM using Unknown data

19 Clustering of unknown data

20 Fault Identification

21 SOM Performance

22 Equiprobable Maps  For Self Organizing feature maps the weight density at convergence is not a linear function of the input density p(v) and hence the neurons of the map will not be active with equal probabilities (i.e. the map is not equiprobabilistic).  For a discrete lattice of neurons, the weight density will be proportional to:  Regardless of the type of neighborhood function used SOM tends to undersample the high probability regions and oversample the low probability regions

23 Avoiding Dead Units  SOM algorithm converges to a mapping which yields neurons that are never active (“dead units”).  These units do not contribute to the minimization of the overall (MSE) distortion of the map.  To produce maps in which the neurons would have an equal probability to be active (Equiprobabilistic Maps) the idea of adding conscience to the winning neuron was introduced.  Techniques of generating Equiprobabilistic Maps discussed are: Conscience Learning Conscience Learning Frequency Sensitive Competitive Learning (FSCL) Frequency Sensitive Competitive Learning (FSCL)

24 Conscience Learning  When a neural network is trained with unsupervised competitive learning on a set of input vectors that are clustered into K groups/clusters then a given input vector v will activate neuron i* that has been sensitized to the cluster containing the input vector.  However if some region in the input space is sampled more frequently than the others, then a single unit begins to win all competitions for this region.  To counter this defect, one records for each neuron i the frequency with which it has won competition in the past, c i, and adds this quantity to the Euclidean distance between the weight vector w i and the current input v.

25 Conscience Learning  In Conscience learning two stages are distinguished First the winning neuron is determined out of the N units: First the winning neuron is determined out of the N units: i* = i* = Second, the winning neuron i* is not necessarily the one that will have its weight vectors updated. Second, the winning neuron i* is not necessarily the one that will have its weight vectors updated.  Which neurons need to be updated depends on an additional term for each unit, which is related to the number of times the unit has won the competition in recent past.  The rule of update is that for each neuron the number of times it has won the competition is recorded and a scaled version of this quantity is added to the distance metric used in minimum Euclidean distance rule

26 Conscience Learning  Update rule  With c i the number of times neuron i has won the competition, and C the scaling factor (“Conscience Factor”).  After determining the winning neuron i* it’s conscience is incremented  The weight of the winning neuron is updated using where is the learning rate and its value is equal to small positive constant where is the learning rate and its value is equal to small positive constant

27 Conscience learning using nominal data

28 Clusters shown with neurons

29 Clustering of nominal data

30 Conscience learning on new data set

31 Clustering of unknown data set

32 Clusters represented on data set

33 Fault Identification

34 Conscience learning Performance

35 Frequency Sensitive Competitive Learning  Another competitive learning scheme which is used is the Frequency Sensitive Competitive Learning  This learning scheme keeps a record of the total number of times each neuron has won the competition during training.  The distance metric in the Euclidean distance rule is then scaled as follows:  After selection of winning neuron its conscience is then incremented and the weight vector updated using the UCL rule:

36 FSCL using Nominal data

37 Clustering of nominal data

38 Clusters shown with neurons

39 FSCL using the unknown data set

40 Clustering using unknown data

41 Clusters of unknown data

42 Fault Identification

43 FSCL performance

44 Conclusions  As shown in the results the performance of SOM algorithm was not as good as compared to the CLT and FSCL approaches.  As SOM produces dead units even if the neighborhood function converges slowly so it was not able to train the neurons well according to the available data sets.  Due to undersampling of high probability regions, SOM was able to detect only two faulty clusters out of the four and thus its performance was not good.  Using CLT and FSCL approach all four faulty clusters were detected using the reference distance as the distance measure.  Thus the equiprobable maps perform much better than the SOM by avoiding the dead units and training the neurons by assigning a conscience with the winning neuron

45 References  Marc M. Van Hulle, Faithful Representations and Topographic maps ‘From Distortion to Information based Self Organization’, John Willey & sons 2000  T. Kohonen, Self Organizing Maps, Springer 1997  Anil K. Jain and Richard C. Dubes, Algorithms for Clustering Data, Prentice Hall 1988  Juha Vesanto and Esa Alhoniemi, “Clustering of the Self Organizing Map”, IEEE Trans. Neural Networks, Vol 11, No 3, May 2000

46 Questions/Comments ?