CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel: 6874-6877

Slides:



Advertisements
Similar presentations
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Advertisements

Neural Networks Dr. Peter Phillips. The Human Brain (Recap of week 1)
Un Supervised Learning & Self Organizing Maps. Un Supervised Competitive Learning In Hebbian networks, all neurons can fire at the same time Competitive.
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Kohonen Self Organising Maps Michael J. Watts
Artificial neural networks:
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
X0 xn w0 wn o Threshold units SOM.
Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Neural Networks Lecture 17: Self-Organizing Maps
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Introduction to undirected Data Mining: Clustering
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Clustering Unsupervised learning Generating “classes”
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Project reminder Deadline: Monday :00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday during.
Lecture 12 Self-organizing maps of Kohonen RBF-networks
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organizing Maps (SOM) Unsupervised Learning.
Self Organized Map (SOM)
Self-organizing Maps Kevin Pang. Goal Research SOMs Research SOMs Create an introductory tutorial on the algorithm Create an introductory tutorial on.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Chapter 9 Neural Network.
Artificial Neural Network Unsupervised Learning
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
NEURAL NETWORKS FOR DATA MINING
Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.
Microarrays.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Self Organizing Feature Map CS570 인공지능 이대성 Computer Science KAIST.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Hierarchical Clustering of Gene Expression Data Author : Feng Luo, Kun Tang Latifur Khan Graduate : Chien-Ming Hsiao.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
381 Self Organization Map Learning without Examples.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Unsupervised Learning Networks 主講人 : 虞台文. Content Introduction Important Unsupervised Learning NNs – Hamming Networks – Kohonen’s Self-Organizing Feature.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Self-Organizing Maps (SOM) (§ 5.5)
Self Organizing Maps: Clustering With unsupervised learning there is no instruction and the network is left to cluster patterns. All of the patterns within.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Hierarchical Clustering
Data Mining, Neural Network and Genetic Programming
Unsupervised Learning Networks
Other Applications of Energy Minimzation
Lecture 22 Clustering (3).
Cluster Analysis in Bioinformatics
Self-Organizing Maps (SOM) (§ 5.5)
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
Presentation transcript:

CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel: Room 07-24, level 7, SOC1, NUS

2 Self-Organizing Maps Based on the work of Kohonen on learning/memory in the human brain As with k-means, the number of clusters need to be specified Moreover, a topology needs also be specified – a 2D grid that gives the geometric relationships between the clusters (i.e., which clusters should be near or distant from each other) The algorithm learns a mapping from the high dimensional space of the data points onto the points of the 2D grid (there is one grid point for each cluster)

3 Self Organizing Maps Creates a map in which similar patterns are plotted next to each other Data visualization technique that reduces n dimensions and displays similarities More complex than k-means or hierarchical clustering, but more meaningful Neural Network Technique –Inspired by the brain

4 Self Organizing Maps (SOM) Each unit of the SOM has a weighted connection to all inputs As the algorithm progresses, neighboring units are grouped by similarity Input Layer Output Layer

NN 45 Biological Motivation Nearby areas of the cortex correspond to related brain functions

6 The brain maps the external multidimensional representation of the world into a similar 1 or 2 - dimensional internal representation. That is, the brain processes the external signals in a topology- preserving way Mimicking the way the brain learns, our system should be able to do the same thing. Brain’s self-organization

7 A Self-Organized Map Data: vectors X T = (X 1,... X d ) from d-dimensional space. Grid of nodes, with local processor (called neuron) in each node. Local processor # j has d adaptive parameters W (j). Goal: change W (j) parameters to recover data clusters in X space.

8 SOM Network Unsupervised learning neural network Projects high- dimensional input data onto two- dimensional output map Preserves the topology of the input data Visualizes structures and clusters of the data

9 - input vector is represented by scalar signals x 1 to x n : X = (x 1 … x n ) - every unit “i” in competitive layer has a weight vector associated with it, represented by variable parameters w i1 to w in : w = (w i1... w in ) - we compute the total input to each neurode by taking the weighted sum of the input signal: n s i =  w ij x j j = 1 -every weight vector may be regarded as a kind of image that shall be matched or compared against a corresponding input vector; our aim is to devise adaptive processes in which weight of all units converge to such values that every unit “i” becomes sensitive to a particular region of domain SOM Algorithm

10 - geometrically, the weighted sum is simply a dot (scalar) product of the input vector and the weight vector: s i =x*w i = x 1 w i x n w in SOM Algorithm X X

11 … … … … … … … … … … … … … … 2-d map of nodes 3x4 SOM DataData array Input vector Weights Node weights of the 3x4 SOM Self-organizing Find the winner: Update the weights: xkxk mimi SOM Algorithm

12 SOM Algorithm Learning Algorithm 1. Initialize w’s 2. Find the winning node i(x) = argmin j || x(n) - w j (n) || 3. Update weights of neighbors w j (n+1) = w j (n) +  (n)  j,i(x) (n) [ x(n) - w j (n) ] 4. Reduce neighbors and  5. Go to 2

13 SOM Training process Nearest neighbor vectors are clustered into the same node

14 Concept of SOM Input space Input layer Reduced feature space Map layer s1s1 s2s2 Mn Sr Ba Clustering and ordering of the cluster centers in a two dimensional grid Cluster centers (code vectors)Place of these code vectors in the reduced space

15 Ba Mn Sr … SA3 It can be used for visualization or used for classification Mg Or used for clustering SA3 Concept of SOM

16 SOM Architecture The input is connected with each neuron of a lattice.The input is connected with each neuron of a lattice. The topology of the lattice allows one to define a neighborhood structure on the neurons, like those illustrated below.The topology of the lattice allows one to define a neighborhood structure on the neurons, like those illustrated below. 2D topology and two possible neighborhoods with a small neighborhood 1D topology

17 Self-Organizing Maps (SOMs) ad b c Idea: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. A D B C

18 Self-Organizing Maps (SOMs) ad b c IDEA: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. A D B C

19 Self-organizing Maps (SOMs)

20 Self-organizing Maps (SOMS)

21 Self-Organizing Maps Suppose we have a r x s grid with each grid point associated with a cluster mean  1,1, …  r,s SOM algorithm moves the cluster means around in the high dimensional space, maintaining the topology specified by the 2D grid (think of a rubber sheet) A data point is put into the cluster with the closest mean The effect is that nearby data points tend to map to nearby clusters (grid points)

22 A Simple Example of Self-Organizing Map This is a 4 x 3 SOM and the mean of each cluster is displayed

23 SOM Applied to Microarray Analysis Consider clustering 10,000 genes Each gene was measured in 4 experiments –Input vectors are 4 dimensional –Initial pattern of 10,000 each described by a 4D vector Each of the 10,000 genes is chosen one at a time to train the SOM

24 SOM Applied to Microarray Analysis The pattern found to be closest to the current gene (determined by weight vectors) is selected as the winner The weight is then modified to become more similar to the current gene based on the learning rate (t in the previous example) The winner then pulls its neighbors closer to the current gene by causing a lesser change in weight This process continues for all 10,000 genes Process is repeated until over time the learning rate is reduced to zero

25 SOM Applied to Microarray Analysis of Yeast Yeast Cell Cycle SOM. (a) 6 × 5 SOM. The 828 genes that passed the variation filter were grouped into 30 clusters. Each cluster is represented by the centroid (average pattern) for genes in the cluster. Expression level of each gene was normalized to have mean = 0 and SD = 1 across time points. Expression levels are shown on y-axis and time points on x-axis. Error bars indicate the SD of average expression. n indicates the number of genes within each cluster. Note that multiple clusters exhibit periodic behavior and that adjacent clusters have similar behavior. (b) Cluster 29 detail. Cluster 29 contains 76 genes exhibiting periodic behavior with peak expression in late G 1. Normalized expression pattern of 30 genes nearest the centroid are shown. (c) Centroids for SOM-derived clusters 29, 14, 1, and 5, corresponding to G 1, S, G 2 and M phases of the cell cycle, are shown.

26 SOM Applied to Microarray Analysis of Yeast Reduce data set to 828 genes Clustered data into 30 clusters using a SOFM Each pattern is represented by its average (centroid) pattern Clustered data has same behavior Neighbors exhibit similar behavior

27 A SOFM Example With Yeast

28 Benefits of SOM SOM contains the set of features extracted from the input patterns (reduces dimensions) SOM yields a set of clusters A gene will always be most similar to a gene in its immediate neighborhood than a gene further away

29 Problems of SOM The algorithm is complicated and there are a lot of parameters (such as the “ learning rate ” ) - these settings will affect the results The idea of a topology in high dimensional gene expression spaces is not exactly obvious –How do we know what topologies are appropriate? –In practice people often choose nearly square grids for no particularly good reason As with k-means, we still have to worry about how many clusters to specify …

30 Comparison of SOM and K-means K-means is a simple yet effective algorithm for clustering data Self-organizing maps are slightly more computationally expensive than K-means, but they solve the problem of spatial relationship

31 Other Clustering Algorithms Clustering is a very popular method of microarray analysis and also a well established statistical technique – huge amount of literature out there Many variations on k-means, including algorithms in which clusters can be split and merged or that allow for soft assignments (multiple clusters can contribute) Semi-supervised clustering methods, in which some examples are assigned by hand to clusters and then other membership information is inferred