Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Advertisements

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Clustering and Dimensionality Reduction Brendan and Yifang April
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Dimensional reduction, PCA
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Face Recognition Jeremy Wyatt.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Unsupervised Learning
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Dimensionality Reduction
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Clustering Unsupervised learning Generating “classes”
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
More on Microarrays Chitta Baral Arizona State University.
Microarrays.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Clustering Features in High-Throughput Proteomic Data Richard Pelikan (or what’s left of him) BIOINF 2054 April
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CLUSTERING EE Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type.
4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Unsupervised Learning
Unsupervised Learning
PREDICT 422: Practical Machine Learning
LECTURE 11: Advanced Discriminant Analysis
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis (PCA)
Molecular Classification of Cancer
Principal Component Analysis
Dimension reduction : PCA and Clustering
Feature space tansformation methods
Unsupervised Learning
Unsupervised Learning
Presentation transcript:

Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline Introduction to Machine LearningIntroduction to Machine Learning Clustering (Thoroughly)Clustering (Thoroughly) Principal Components Analysis (Briefly)Principal Components Analysis (Briefly) Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly) Perhaps another time: ‘Supervised’ vs. ‘Unsupervised’ Learning‘Supervised’ vs. ‘Unsupervised’ Learning Neural Networks and Support Vector MachinesNeural Networks and Support Vector Machines

Outline Introduction to Machine LearningIntroduction to Machine Learning Clustering (Thoroughly)Clustering (Thoroughly) Principal Components Analysis (Briefly)Principal Components Analysis (Briefly) Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

What is Machine Learning Definition: The ability of a computer to recognize patterns that have occurred repeatedly and improve its performance based on past experience.

Questions for Machine Learning Which genes are co-regulated?Which genes are co-regulated? Which genes have similar functional roles?Which genes have similar functional roles? Do certain gene profiles correlate with diseased patients?Do certain gene profiles correlate with diseased patients? (Which genes are upregulated/downregulated?)(Which genes are upregulated/downregulated?)

The Data: How to think about it In Machine Learning, each data point is a vector. Example: Patient_X= (gene_1, gene_2, gene_3, …, gene_N) Expression Ratio for Gene 3

Patient_X= (gene_1, gene_2, gene_3, …, gene_N) Each vector ‘lives’ in a high-dimensional space. N is normally larger than 2, so we can’t visualize the data. Expression Ratio for Gene 3

Our Goal Tease out the structure of our data from the high-dimensional space in which it lives. Breast cancer patients Healthy patients Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

Two ways to think of data for Microarray Experiments 1. All genes for one patient make a vector: Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

Two ways to think of data for Microarray Experiments 1. All genes for one patient make a vector: Patient_X= (gene_1, gene_2, gene_3, …, gene_N) 2. All experiments for one gene make a vector: Gene_X= (experiment_1, experiment_2 …, experiment_N)

Outline Introduction to Machine LearningIntroduction to Machine Learning ClusteringClustering –Hierarchical Clustering –K-means Clustering –Stability of Clusters Principal Components AnalysisPrincipal Components Analysis Self-Organizing MapsSelf-Organizing Maps

Clustering Data Want ‘similar’ data to group together. Problems: Don’t know which definition of ‘similar’ to use in order to extract useful information.Don’t know which definition of ‘similar’ to use in order to extract useful information. Without external validation, difficult to tell if clustering is meaningful.Without external validation, difficult to tell if clustering is meaningful. How many clusters?How many clusters?

Similarity Metric: Formal name for ‘measure of similarity’ between 2 points.Metric: Formal name for ‘measure of similarity’ between 2 points. Every clustering algorithm (hierarchical, k-means, etc.) needs to decide on a metric.Every clustering algorithm (hierarchical, k-means, etc.) needs to decide on a metric. Can argue in favour of various metrics, but no correct answer.Can argue in favour of various metrics, but no correct answer.

Some Metrics Euclidean distance:Euclidean distance: Correlation based:Correlation based: Mutual Information based:Mutual Information based: X and Y here are two vectors (eg. two patients)

Outline Introduction to general Machine LearningIntroduction to general Machine Learning ClusteringClustering –Hierarchical Clustering –K-means Clustering –Stability of Clusters Principal Components AnalysisPrincipal Components Analysis Self-Organizing MapsSelf-Organizing Maps

Outline of Hierarchical Clustering 1. Start by making every data point its own cluster. 2. Repeatedly combine ‘closest’ clusters until only one is left. (example to follow)

Initially each datum is its own cluster. Simple Example - Step 1 Legend: Data point (patient) One cluster A G C E B D F High-dimensional point

Combine the two closest clusters. Simple Example - Step 2 Legend: Data point (patient) One cluster A G C E B D F E F DENDOGRAM

Again: Combine the next two closest clusters. Simple Example - Step 3 Legend: Data point (patient) One cluster A G C E B D F E F B C DENDOGRAM

And again... Simple Example - Step 4 Legend: Data point (patient) One cluster A G C E B D F D E F B C DENDOGRAM

And again... Simple Example - Step 5 Legend: Data point (patient) One cluster A G C E B D F D E F B C A DENDOGRAM

And again... Simple Example - Step 6 Legend: Data point (patient) One cluster A G C E B D F G D E F B C A DENDOGRAM

And again... Simple Example - Step 7 Legend: Data point (patient) One cluster A G C E B D F G D E F B C A DENDOGRAM

And again... Simple Example - Step 7 Legend: Data point (patient) One cluster A G C E B D F G D E F B C A Metric Scale DENDOGRAM

Digression:‘Distance’ between clusters 3 common ways: 1. Single-Linkage 2. Complete-Linkage 3. Average-Linkage

What we get out of HC A hierarchical set of clusters.A hierarchical set of clusters. A dendogram showing which data points are most closely related, as defined by the metric.A dendogram showing which data points are most closely related, as defined by the metric. A B C D E F G

What we get out of HC Can we tell how data points are related by looking at the horizontal positions of the data points...?Can we tell how data points are related by looking at the horizontal positions of the data points...? Must be careful about interpretation of the dendogram - example to follow.Must be careful about interpretation of the dendogram - example to follow. A B C D E F G

G D E F B C A Notice that we can swap branches while maintaining the tree structure.

G D F E B C A Notice that we can swap branches while maintaining the tree structure.

G D F E B C A Again...

B C A G D F E Again...

B C A G D F E Again... How many ways to swap branches if there are N data points?

B C A G D F E Again... How many ways to swap branches if there are N data points? 2 N-1 For N=100 2 N-1 =1.3 x 10 30

Two data points that were close together in one tree, may be far apart in another. G D F E B C AB C A G D F E 1. G and A far apart 2. G and A close 12

Two data points that were close together in one tree, may be far apart in another. G D F E B C AB C A G D F E 1. G and A far apart 2. G and A close 12 There is a way to help overcome the arbitrariness of the branches: Self- Organizing Maps - SOMs - discuss later

Lesson Learned Be careful not to overinterpret the results of hierarchical clustering (along the horizontal axis).

What is HC used for? Typically, grouping genes that are co-regulated.Typically, grouping genes that are co-regulated. (Could use for grouping patients too.) While useful, it is a relatively simple, unsophisticated tool.While useful, it is a relatively simple, unsophisticated tool. It is more of a visualization tool, rather than a mathematical model.It is more of a visualization tool, rather than a mathematical model. A B C D E F G

Outline Introduction to Machine LearningIntroduction to Machine Learning ClusteringClustering –Hierarchical Clustering –K-means Clustering Principal Components AnalysisPrincipal Components Analysis Self-Organizing MapsSelf-Organizing Maps

K-Means Clustering Goal: Given a desired number of clusters, find out the cluster centersfind out the cluster centers find out which data point belongs to each clusterfind out which data point belongs to each cluster

Legend: Data point (patient) Cluster centre Must specify that we want 3 clusters.

Outline of K-Means Clustering Step 1 - Decide how many clusters (let’s say 3). Step 2 - Randomly choose 3 cluster centers.

Outline of K-Means Clustering Step 3 - Choose a metric. Step 4 - Assign each point to the cluster that it is ‘closest’ to (according to metric).

Outline of K-Means Clustering Step 5 - Recalculate cluster centres using means of points that belong to a cluster. Step 6 - Repeat until convergence (or fixed number of steps, etc.). Newly calculated cluster centres

Outline of K-Means Clustering Another step.... assign points to clusters. Reassign Points

Outline of K-Means Clustering And the final step.... reposition the means. Newly calculated cluster centres

Outline of K-Means Clustering And the final step.... reassign points. Reassign Points

Variations of K-Means: K-Median Clustering (uses median to find new cluster positions instead of mean). Use median/mean/etc/ to reposition cluster.

Related to K-Means Clustering: Mixture of Gaussians (now clusters have a width as well) - Gaussian Probability Distribution instead of a metric. Other differences too. Soft Partition (vs. Hard)

Comments on K-Means Clustering May not converge nicely, need multiple random restarts.May not converge nicely, need multiple random restarts. Results are straightforward (unlike hierarchical clustering).Results are straightforward (unlike hierarchical clustering). Still a relatively simple tool - not much mathematical modelling.Still a relatively simple tool - not much mathematical modelling.

Comments on K-Means Clustering Earlier: showed random initialization.Earlier: showed random initialization. Can run hierarchical clustering to initialize K-Means Clustering algorithm.Can run hierarchical clustering to initialize K-Means Clustering algorithm. This can help with convergence problems as well speed up the algorithm.This can help with convergence problems as well speed up the algorithm.

Outline Introduction to Machine LearningIntroduction to Machine Learning ClusteringClustering –Hierarchical Clustering –K-means Clustering –Stability of Clusters Principal Components AnalysisPrincipal Components Analysis Self-Organizing MapsSelf-Organizing Maps

Stability of Clusters Ideally, want different (good) clustering techniques to provide similiar results.Ideally, want different (good) clustering techniques to provide similiar results. Otherwise the clustering is likely arbitrary, not modelling any true structure in the data set.Otherwise the clustering is likely arbitrary, not modelling any true structure in the data set.

Outline Introduction to Machine LearningIntroduction to Machine Learning Clustering (Thoroughly)Clustering (Thoroughly) Principal Components Analysis (Briefly)Principal Components Analysis (Briefly) Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

Principal Components Analysis (PCA) Mathematical technique to reduce the dimensionality of the data.Mathematical technique to reduce the dimensionality of the data.

PCA - Dimension Reduction 2 Projections Shown Some projections are more informative than others. While projections differ, the object remains unchanged in original space. 3D  2D (two ways)

Why? Instead of clustering 20,000 dimensional data - cluster 100 dimensional data.Instead of clustering 20,000 dimensional data - cluster 100 dimensional data. Typically some dimensions are redundant.Typically some dimensions are redundant. Might eliminate noise, get more meaningful clusters (?)Might eliminate noise, get more meaningful clusters (?)

Why? Instead of clustering 20,000 dimensional data - cluster 100 dimensional data.Instead of clustering 20,000 dimensional data - cluster 100 dimensional data. Typically some dimensions are redundant.Typically some dimensions are redundant. Might eliminate noise, get more meaningful clusters (?)Might eliminate noise, get more meaningful clusters (?)  Fewer dimension means easier to initialize ML algorithms and get good results.

Simple 2D Example X Y One could cluster these 2D points in 2 dimensions.

Simple 2D Example X Y One could cluster these 2D points in 2 dimensions. But... what if...

Simple 2D Example X We squashed them into 1D. ‘squash’  geometric projection

Simple 2D Example X Y X Y-Dimension was redundant. Only needed X variable to cluster nicely.

Another 2D Example X Y It is not obvious which dimension to keep now.

Another 2D Example X Y X Y There is no axis onto which we can project to get good separation of the data...

Another 2D Example X Y But if we project the data onto a combination (linear combination) of the two dimensions... it works out nicely. X Y

That was the Intuition Behind PCA Outline of PCA: Step 1 - Find the direction that accounts for the largest amount of variation in the data set, call this E 1.Step 1 - Find the direction that accounts for the largest amount of variation in the data set, call this E 1. X Y E1E1 First Principal Component

That was the Intuition Behind PCA Outline of PCA: Step 1 - Find the direction that accounts for the largest amount of variation in the data set, call this E 1.Step 1 - Find the direction that accounts for the largest amount of variation in the data set, call this E 1. Step 2 - Find the direction which is perpendicular (orthogonal/uncorrelated) to E 1 and accounts for the next largest amount of variation in the data set, call this E 2.Step 2 - Find the direction which is perpendicular (orthogonal/uncorrelated) to E 1 and accounts for the next largest amount of variation in the data set, call this E 2.

That was the Intuition Behind PCA Outline of PCA: Step 3 - Find 3 rd next best direction which is orthogonal to the other 2 directions - call this E 3.Step 3 - Find 3 rd next best direction which is orthogonal to the other 2 directions - call this E 3. Step N - Find the N th such direction. (If there were N dimensions to begin with).Step N - Find the N th such direction. (If there were N dimensions to begin with).

PCA - Some Linear Algebra... Covariance Matrix of Original Data Principal Components Variance in each Component Singular Value Decomposition (SVD) 2 nd Principal Component

Principal Components Analysis Typical dimensional reduction might be:Typical dimensional reduction might be: 1 million  200,000, which might retain 95% of the original information. which might retain 95% of the original information. Reduction depends on set of data.Reduction depends on set of data.

PCA - Useful for clustering? It turns out that PCA can lead to worse clustering than simply using the original data (not necessarily).It turns out that PCA can lead to worse clustering than simply using the original data (not necessarily). PCA is often used in conjunction with other techniques, such as Artificial Neural Networks, Support Vector Machines, etc.PCA is often used in conjunction with other techniques, such as Artificial Neural Networks, Support Vector Machines, etc.

PCA - Interesting Side Note PCA is the basis for most of the Face Recognition systems currently in use.PCA is the basis for most of the Face Recognition systems currently in use. eg. Security in airports etc.eg. Security in airports etc. Principal ‘Face’ Directions

Outline Introduction to Machine LearningIntroduction to Machine Learning Clustering (Thoroughly)Clustering (Thoroughly) Principal Components Analysis (Briefly)Principal Components Analysis (Briefly) Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

Self-Organizing Maps (SOMs) Way to visualize high-dimensional data in a low-dimensional space.Way to visualize high-dimensional data in a low-dimensional space. Commonly view data in 1 or 2 dimensions with SOMs.Commonly view data in 1 or 2 dimensions with SOMs.

Self-Organizing Maps (SOMs) Can think of SOMs as a cross between K- Means Clustering and PCA:

Self-Organizing Maps (SOMs) Can think of SOMs as a cross between K- Means Clustering and PCA:  K-Means: find Cluster centres.  PCA: reduce dimensionality of the cluster centres. (i.e. impose a ‘structure’ on the relationship between cluster centres) (At the same time!)

Self-Organizing Maps Example: Impose a 1D structure on the cluster centres. 1 Dimension 5000 Dimensions

Self-Organizing Maps This imposes an ordering on the cluster centres. 1 Dimension 5000 Dimensions

Self-Organizing Maps This imposes an ordering on the data points. Data points from Cluster 1 come before points in Cluster 2, etc. Then order based on proximity to neighbouring clusters. 1 Dimension 5000 Dimensions

Self-Organizing Maps What is important to know for our immediate interest: SOM imposes a unique ordering on the data points. 1 Dimension 5000 Dimensions

SOMs and Hierarchical Clustering G D F E B C A Recall the problem of arbitrariness in the order of the branches in Hierarchical Clustering? Can use SOMs to help. Hierarchical Clustering (dendogram)

SOMs, Eisen, Cluster G D F E B C A Eisen’s Cluster can use the ordering from the SOM, to do hierarchical clustering. 1) Run 1D SOM on data set. 2) Build dendogram using ordering from SOM. Hierarchical Clustering (dendogram)

Self-Organizing Maps Not normally what SOMs are used for (i.e. hierarchical clustering)Not normally what SOMs are used for (i.e. hierarchical clustering) Mainly used for visualization, and as a first step before further analysis.Mainly used for visualization, and as a first step before further analysis. Can also use 2D, 3D SOMs.Can also use 2D, 3D SOMs.

Concluding Remarks I hope that I managed to: 1) Give some idea of what machine learning is about (‘structure’ in high- dimensional spaces). 2) The intuition behind several techniques, and familiarity with their names.

Main Ideas of PCA For N dimensional data, find N directions.For N dimensional data, find N directions. Can represent any one of our original data points using (a linear combination) of these N directions:Can represent any one of our original data points using (a linear combination) of these N directions:i.e. N th direction

Main Ideas of PCA Key Idea: Can represent extremely well any one of our original data points using fewer than N directions:Key Idea: Can represent extremely well any one of our original data points using fewer than N directions: Fewer than N directions.

Another 2D Example X Y But if we project the data onto a combination (linear combination) of the two dimensions... it works out nicely. X Y

Concluding Remarks Understand the algorithms behind your analysis. Blind application can lead to erroneous conclusions.