Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch 2002-03) IBAB, Bangalore.

Slides:

Advertisements

Similar presentations

BioInformatics (3).

Advertisements

Basic Gene Expression Data Analysis--Clustering

Albert Gatt Corpora and Statistical Methods Lecture 13.

PARTITIONAL CLUSTERING

Introduction to Bioinformatics

2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation.

DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.

SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.

Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.

Microarray Data Preprocessing and Clustering Analysis

Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.

Introduction to Bioinformatics Algorithms Clustering.

Clustering Algorithms Bioinformatics Data Analysis and Tools

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.

Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.

Introduction to Bioinformatics - Tutorial no. 12

Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.

Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz

Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.

Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.

Clustering Unsupervised learning Generating “classes”

BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.

COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong

Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Clustering in Microarray Data-mining and Challenges Beyond Qing-jun Wang Center for Biophysics & Computational Biology University of Illinois at Urbana-Champaign.

Gene expression analysis

Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.

Statistical Analysis of DNA Microarray. An Example of HDLSS in Genetics.

An Overview of Clustering Methods Michael D. Kane, Ph.D.

By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 4 Clustering Algorithms Bioinformatics Data Analysis and Tools

CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Clustering Algorithms Sunida Ratanothayanon. What is Clustering?

Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.

1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.

CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:

Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.

Data Mining and Text Mining. The Standard Data Mining process.

4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering

Semi-Supervised Clustering

CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Data Clustering Michael J. Watts

K-means and Hierarchical Clustering

John Nicholas Owen Sarah Smith

Clustering BE203: Functional Genomics Spring 2011 Vineet Bafna and Trey Ideker Trey Ideker Acknowledgements: Jones and Pevzner, An Introduction to Bioinformatics.

Multivariate Statistical Methods

Dimension reduction : PCA and Clustering

Clustering The process of grouping samples so that the samples are similar within each group.

Presentation transcript:

Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore Done at Siri Technologies Pvt. Ltd. Bangalore

Outline Introduction Overview of Data Analysis Normalization Clustering Algorithms Future work Acknowledgements Questions ???

Introduction

Overview of Data Analysis

Normalization An attempt to remove systematic variation from data. Sources of systematic variation – Biological source Influenced by genetic or environmental factors, Age, sex etc. Technical source Induced during extraction, labelling, and hybridization of samples Printing tip problems Measurement source Different DNA conc. Scanner problem

Why Normalize Data To recognize the biological information in data. To compare data from one array to another. In practice we do not understand the data – inevitably some biology will be removed too.

Normalization methods Methods of elements selections Housekeeping genes All elements Using Spiked control Methods to calculate normalization factor Log ratio Lowess Ratio statistics

Clustering For a sample of size “n” described by a d- dimensional feature space, clustering is a procedure that Divides the d-dimensional features in K-disjoint groups in such a way that the data points within each group are more similar to each other than to any other data point in other group.

Clustering algorithms Unsupervised – without a priory biological information Agglomerative – Hierarchical Divisive – K-means, SOM Supervised – a priory biological knowledge Support vector machine ( SVM)

Hierarchical clustering (HC) Agglomerative technique steps The pair-wise distance is calculated between all genes. The two genes with shortest distance are grouped together to form a cluster. Then two closest cluster are merged together, to form a new cluster. The distances are calculated between this new cluster and all other clusters Steps 2 to 4 are repeated until all the objects are in one cluster.

HC contd. Data table

HC contd. Calculation of distance matrix using data table. Experiment » Axis Log ratio of genes » Coordinates For n-experiments n dimensional space

HC contd. Distance between genes Euclidean distance Pearson correlation Semi-metric distance – Vector angle Metric distance – Manhattan or City block

HC contd. Distance between clusters Single linkage clustering Complete linkage clustering Average linkage clustering UPGMA Weighted pair-group average Within-groups clustering Ward’s method

HC contd. The result of HC displayed as branching tree diagram called “Dendrogram”. Pros and cons of HC Easy to implement, quick visualization of data set. Ignores negative associations between genes, falls in category of greedy algorithms.

K-means Clustering Divisive approach Steps Specify K-initial clusters and find their centroid. For each data point the distance to each centroid is calculated. Each data point is assigned to its nearest centroid. Centroids are shifted to the center of data points assigned to it. Steps 2-4 is iterated until centroid are not shifted anymore.

K-means clustering contd. Pros and Cons No dendrogram It is a powerful method if one has prior idea about the no. of cluster, so it works well with PCA.

Future Work It includes similar analysis on Self Organizing Map (SOM) Support Vector Machine (SVM) Relevance Network Gene Shaving Self Organizing Tree Analysis (SOTA) Cluster Affinity Search Technique (CAST)

Acknowledgements Institute of Bioinformatics and Applied Biotechnology (IBAB), Bangalore Dr. Ashwini K Heerekar (Siri Technologies Pvt. Ltd, Bangalore) Dr. Jonnlagada Srinivas (Siri Technologies Pvt. Ltd, Bangalore) Mr. Kiran Kumar (Siri Technologies Pvt. Ltd, Bangalore) Mr. Mahantha Swamy MV. (Siri Technologies Pvt. Ltd, Bangalore)

Selected references: A Biologist Guide to Analysis of DNA Microarray DATA, by Steen Knudsen DNA Microarrays And Gene Expression from experiment to data analysis and modeling, by P. Baldi and G. Wesely Papers: Computational Analysis of Microarray Data by John Quackenbush, Nature Genetics Review, June 2001, vol2. The use and analysis of Microarray Data by Atul Butte, Nature Review drug discovery, Dec 2002, vol1. Microarray Data Normaliation and Transformation by John Quackenbush, Nature Genetics.

Questions ???

Thank You