Introduction to Bioinformatics - Tutorial no. 12

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Basic Gene Expression Data Analysis--Clustering
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Cluster analysis for microarray data Anja von Heydebreck.
Machine Learning and Data Mining Clustering
Introduction to Bioinformatics
BASIC METHODOLOGIES OF ANALYSIS: SUPERVISED ANALYSIS: HYPOTHESIS TESTING USING CLINICAL INFORMATION (MLL VS NO TRANS.) IDENTIFY DIFFERENTIATING GENES Basic.
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
The Broad Institute of MIT and Harvard Clustering.
More Microarray Analysis: Unsupervised Approaches Matt Hibbs Troyanskaya Lab.
Microarray GEO – Microarray sets database
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Microarray Data Preprocessing and Clustering Analysis
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.
What is Cluster Analysis?
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
What is Cluster Analysis?
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Microarrays.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Gene expression analysis
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Clustering.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Armstrong et al, Nature Genetics 30, (2002)
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Computational Biology
Unsupervised Learning
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Canadian Bioinformatics Workshops
Data Clustering Michael J. Watts
Clustering.
John Nicholas Owen Sarah Smith
Gene expression analysis
Cluster Analysis in Bioinformatics
(A) Hierarchical clustering was performed to identify groups of patients with similar RNASeq expression of 20 genes associated with reduced survivability.
Hierarchical Clustering
Unsupervised Learning: Clustering
Unsupervised Learning
Presentation transcript:

Introduction to Bioinformatics - Tutorial no. 12 Expression Data Analysis: - Clustering - GEO - EPClust

Application of Microarrays We only know the function of about 20% of the 30,000 genes in the Human Genome Gene exploration Faster and better Applications: Evolution Behavior Cancer Research

Microarray Analysis Unsupervised Grouping: Clustering Pattern discovery via grouping similarly expressed genes together Three techniques most often used k-Means Clustering Hierarchical Clustering Kohonen Self Organizing Feature Maps

Hierarchical Agglomerative Clustering Michael Eisen, 1998 Cluster (algorithm) TreeView (visualization) Hierarchical Agglomerative Clustering Step 1: Similarity score between all pairs of genes Pearson Correlation Euclidean distance Step 2: Find the two most similar genes, replace with a node that contains the average Builds a tree of genes Step 3: Repeat

Agglomerative Hierarchical Clustering Need to define the distance between the new cluster and the other clusters. Single Linkage: distance between closest pair. Complete Linkage: distance between farthest pair. Average Linkage: average distance between all pairs or distance between cluster centers Agglomerative Hierarchical Clustering Distance between joined clusters 5 2 4 3 1 4 2 5 1 3 The dendrogram induces a linear ordering of the data points Dendrogram

Results of Clustering Gene Expression CLUSTER is simple and easy to use De facto standard for microarray analysis Limitations: Hierarchical clustering in general is not robust Genes may belong to more than one cluster

K-Means Clustering Algorithm Randomly initialize k cluster means Iterate: Assign each genes to the nearest cluster mean Recompute cluster means Stop when clustering converges Notes: Really fast Genes are partitioned into clusters How do we select k?

K-Means Algorithm Randomly Initialize Clusters

K-Means Algorithm Assign data points to nearest clusters

K-Means Algorithm Recalculate Clusters

K-Means Algorithm Recalculate Clusters

K-Means Algorithm Repeat

K-Means Algorithm Repeat

K-Means Algorithm Repeat … until convergence

EPClust Input (1) Expression data matrix Extra annotation for gene rows Method of tabulation Name for further analysis

EPClust Input (2) Method of measuring distance between gene rows Cluster hierarchically Number k of means Cluster into k means

GEO: Gene Expression Omnibus NCBI database for gene expression data Founded at end of 2000

Querying GEO Browse records Search for entries containing a gene Search for experiments Search with Entrez

SGD – Expression database http://db.yeastgenome.org/cgi-bin/expression/expressionConnection.pl

SGD – Expression database

SGD – Expression database

SGD – Expression database

Gene grouping Relative values Two labs are running experiments on the APO1 gene. Suggest a method that would allow them to compare their results. Gene grouping Relative values

+ - Explain how microarrays can be used as a basis for diagnostic Sample 1 Sample 2 Sample 3 sample4 Sample 5 Gen1 + - Gen2 Gen3 Gen4 Gen5

+ - Explain how microarrays can be used as a basis for diagnostic Sample 1 Sample 2 sample4 Sample 3 Sample 5 Gen1 + - Gen2 Gen3 Gen4 Gen5