Microarray II. What is a microarray Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose.

Slides:

Advertisements

Similar presentations

Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.

Advertisements

Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.

BioInformatics (3).

Basic Gene Expression Data Analysis--Clustering

Supervised and unsupervised analysis of gene expression data Bing Zhang Department of Biomedical Informatics Vanderbilt University

Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.

Introduction to Bioinformatics

1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.

Microarray technology and analysis of gene expression data Hillevi Lindroos.

Bioinformatics: One Minute and One Hour at a Time Laurie J. Heyer L.R. King Asst. Professor of Mathematics Davidson College

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene.

Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.

Microarray Data Preprocessing and Clustering Analysis

Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages.

Identification of regulatory elements. Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy.

‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.

Microarray I. Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages.

Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages.

Microarrays Technology behind microarrays Data analysis approaches

Clustering (Gene Expression Data) 6.095/ Computational Biology: Genomes, Networks, Evolution LectureOctober 4, 2005.

Cluster Analysis Class web site: Statistics for Microarrays.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Analysis of microarray data

Introduction to Bioinformatics - Tutorial no. 12

Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.

Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.

Normalization Review and Cluster Analysis Class web site: Statistics for Microarrays.

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Statistical Analysis of Microarray Data

Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.

Analysis of microarray data

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,

From motif search to gene expression analysis

Clustering of DNA Microarray Data Michael Slifker CIS 526.

Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.

Lecture 11. Microarray and RNA-seq II

Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.

Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.

Microarray Data Analysis (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct 13, 2005 ChengXiang Zhai Department of Computer Science University of.

Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.

Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.

Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.

An Overview of Clustering Methods Michael D. Kane, Ph.D.

Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.

Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.

CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.

Gene expression. Gene Expression 2 protein RNA DNA.

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:

Cluster Analysis, an Overview Laurie Heyer. Why Cluster? Data reduction – Analyze representative data points, not the whole dataset Hypothesis generation.

C LUSTERING José Miguel Caravalho. CLUSTER ANALYSIS OR CLUSTERING IS THE TASK OF ASSIGNING A SET OF OBJECTS INTO GROUPS ( CALLED CLUSTERS ) SO THAT THE.

Computational Biology

Gene Expression Analysis

Cluster Analysis in Bioinformatics

Revealing Global Regulatory Perturbations across Human Cancers

Volume 12, Issue 6, Pages (December 2003)

Revealing Global Regulatory Perturbations across Human Cancers

Presentation transcript:

Microarray II

What is a microarray

Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose

Raw data – images Red (Cy5) dot – overexpressed or up-regulated Green (Cy3) dot – underexpressed or down- regulated Yellow dot –equally expressed Intensity - “absolute” level cDNA plotted microarray

Levels of analysis Level 1: Which genes are induced / repressed? Gives a good understanding of the biology Methods: Factor-2 rule, t-test. Level 2: Which genes are co-regulated? Inference of function. -Clustering algorithms. Level 3: Which genes regulate others? Reconstruction of networks. - Transcriptions factor binding sites.

Level 1 2-fold rule: Is a gene 2-fold up (or down) regulated? Students t-test: Is the regulation significantly different from background variation? (Needs repeated measurements)

T-test X ~ N(  ), Cannot reject H 0 Reject H 0 The p-value is the probability of drawing the wrong conclusion by rejecting a null hypothesis 

Multiple testing In a microarray experiment, we perform 1 test / gene Prob (correct) = 1 -  c Prob (globally correct) = (1 –  c  n Prob (wrong somewhere) = 1 - (1 –  c  n  e = 1 - (1 –  c  n For small  e :  c   e  n Bonferroni correction

Multiple Experiments: Time course (Chu et al) Explore changes in gene expression during a biological process. Extract mRNA at time points 0, 0.5, 2, 5, 7, 9, and 11 hours and wish to compare expression profiles across time points. Compensate for array variability by using the 0 time point as common reference (green channel).

Experiment: time course Time 0 Genes Sample annotations Gene annotations Intensity (Red) Intensity (Green)

Experiment: time course Time 0.5 Genes Intensity (Red) Intensity (Green) Time 0

Experiment: time course Genes Time (hours)

Gene expression database Genes Gene expression levels Samples Sample annotations Gene annotations Gene expression matrix

Gene expression database Samples Genes Gene expression matrix Timeseries, Conditions A, B, … Mutants in genes a, b … Etc.

Data normalization expression of gen x in experiment i expression of gen x in reference Logarithm of ratio - treats induction and repression of identical magnitude as numerical equal but with opposite sign. red/green - ratio of expression – 2 - 2x overexpressed – x underexpressed log 2 ( red/green ) - “log ratio” – 1 2x overexpressed – -1 2x underexpressed

Analysis of multiple experiments Expression of gene x in m experiments can be represented by an expression vector with m elements Z-transformation: If X ~ N(  ),

Clustering Hierachical clustering: - Transforms n (genes) * m (experiments) matrix into a diagonal n * n similarity (or distance) matrix Similarity (or distance) measures: Euclidic distance Pearsons correlation coefficent Eisen et al PNAS 95:

Most Common Minkowski Metrics

An Example 4 3 x y

Similarity Measures: Correlation Coefficient

Time Gene A Gene B Gene A Time Gene B Expression Level Time Gene A Gene B

Clustering of Genes and Conditions Unsupervised: –Hierarchical clustering –K-means clustering –Self Organizing Maps (SOMs)

Clustering Hierachical clustering: - Transforms n (genes) * m (experiments) matrix into a diagonal n * n similarity (or distance) matrix Similarity (or distance) measures: Euclidic distance Pearsons correlation coefficent Eisen et al PNAS 95:

Distance Measures: Minkowski Metric r r m i ii m m yxyxd yyyy xxxx myx ||),( )( )(      by defined is metric Minkowski The :features have both and objects two Suppose  

Most Common Minkowski Metrics ||max),( ||),( 1 ||),( ii m i m i ii m i ii yxyxd r yxyxd r yxyxd r            )distance sup"(" 3, distance) (Manhattan 2, ) distance (Euclidean 1,

An Example 4 3 x y

Similarity Measures: Correlation Coefficient. and :averages )()( ))(( ),(           m i i m m i i m m i m i ii m i ii yyxx yyxx yyxx yxs

Similarity Measures: Correlation Coefficient Time Gene A Gene B Gene A Time Gene B Expression Level Time Gene A Gene B

Distance-based Clustering Assign a distance measure between data Find a partition such that: –Distance between objects within partition (i.e. same cluster) is minimized –Distance between objects from different clusters is maximized Issues: –Requires defining a distance (similarity) measure in situation where it is unclear how to assign it –What relative weighting to give to one attribute vs another? –Number of possible partition is super-exponential

Clustering of Genes and Conditions Unsupervised: –Hierarchical clustering –K-means clustering –Self Organizing Maps (SOMs)

Ordered dendrograms Hierachical clustering: Hypothesis: guilt-by-association Common regulation -> common function Eisen98

Hierarchical Clustering Techniques At the beginning, each object (gene) is a cluster. In each of the subsequent steps, two closest clusters will merge into one cluster until there is only one cluster left.

Hierarchical Clustering Given a set of n items to be clustered, and an n*n distance (or similarity) matrix, the basic process hierarchical clustering is this: 1.Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. 2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster. 3.Compute distances (similarities) between the new cluster and each of the old clusters. 4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Merge two clusters by: Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities

Single-Link Method Diagonal n*n distance Matrix Euclidean Distance b a cd (1) cd a,b (2) a,b,c d (3) a,b,c,d

Complete-Link Method b a Distance Matrix Euclidean Distance (1) (2) (3) a,b ccd d c,d a,b,c,d

Compare Dendrograms Single-LinkComplete-Link

Serum stimulation of human fibroblasts (24h) Cholesterol biosynthesis Celle cyclus I-E response Signalling/ Angiogenesis Wound healning

Partitioning k-means clustering Self organizing maps (SOMs)

k-means clustering Tavazoie et al Nature Genet. 22:

k-Means Clustering Algorithm 1) Select an initial partition of k clusters 2) Assign each object to the cluster with the closest centre 3) Compute the new centres of the clusters 4) Repeat step 2 and 3 until no object changes cluster

1. centroide

2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6

1. centroide 2. centroide 3. centroide 5. centroide 6. centroide k = 6

1. centroide 2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6

Self organizing maps Tamayo et al PNAS 96:

1. centroide2. centroide3. centroide 4. centroide 5. centroide6. centroide k = (2,3) = 6

k = 6

Cluster Co-regulation (DeRisi et al, 1997)

Cluster of co-expressed genes, pattern discovery in regulatory regions 600 basepairs Expression profiles Upstream regions Retrieve Pattern over-represented in cluster

Some Discovered Patterns Pattern Probability ClusterNo.Total ACGCG 6.41E ACGCGT 5.23E CCTCGACTAA 5.43E GACGCG 7.89E TTTCGAAACTTACAAAAAT 2.08E TTCTTGTCAAAAAGC 2.08E ACATACTATTGTTAAT 3.81E GATGAGATG 5.60E TGTTTATATTGATGGA 1.90E GATGGATTTCTTGTCAAAA 5.04E TATAAATAGAGC 1.51E GATTTCTTGTCAAA 3.40E GATGGATTTCTTG 3.40E GGTGGCAA 4.18E TTCTTGTCAAAAAGCA 5.10E Vilo et al. 2001

Results Over 6000 “interesting” patterns Many from homologous upstreams - removed –Leaves 1500 patterns These patterns clustered into 62 groups –Found alignments, consensus, and profiles Of 62 clusters - 48 had patterns matching SCPD (experimentally mapped) binding site database

The " GGTGGCAA " Cluster

Clustering and promoter elements Harmer et al Science 290:

Two sided Clustering

 -Deletion mutations Vector Chromosomes Homologous recombination

Transcriptional profiling of mutants  -Mutants Genes

Microarray and cancer Alizadeh et al Nature 403:

Diffuse large B-cell lymphoma

Human tumor patient and normal cells; various conditions Cluster genes across tumors Classify tumors according to genes

Regulatory pathways: KEGG

Regulatory pathway reconstruction Ideker et al Science 2001

Perturbations Selected genes are deleted. RNA is extracted from  -strains and from WT under +/- Galactose conditions Repeated measurements enable estimation of statistical significance Compare data – model –Design new experiments Clustering : Self Organizing Maps Protein – mRNA correlations Network correlations –Protein-DNA (Promoter analysis) –Protein-Protein

Correlation mRNA – protein levels Mass-spectrometry

ICAT reagent Isotope coded affinity tags

ICAT procedure

Mapping of gene expression changes onto interaction network Yellow: Protein-DNA Blue: Protein-protein

Hierarchical clustering of  -perturbations

Conclusion Significance Database (matrix), data normalization Distances HCL, SOM, k-means Two-sided clustering Promoter elements Metabolic / regulatory pathways Deletion mutants ICAT technology; MS/MS