Linear Models for Microarray Data

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Estimating the False Discovery Rate in Multi-class Gene Expression Experiments using a Bayesian Mixture Model Alex Lewin 1, Philippe Broët 2 and Sylvia.
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Multiple testing and false discovery rate in feature selection
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
CGeMM – University of Louisville Mining gene-gene interactions from microarray data - Coefficient of Determination Marcel Brun – CGeMM - UofL.
Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Topic 6: Introduction to Hypothesis Testing
Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa.
Variance and covariance M contains the mean Sums of squares General additive models.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Differentially expressed genes
Statistical Analysis of Microarray Data
Comparing Means.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Multiple testing in high- throughput biology Petter Mostad.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Differential Expression II Adding power by modeling all the genes Oct 06.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics in Medicine 6:3-10 Suppose we conduct.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Hierarchical Bayesian Model Specification Model is specified by the Directed Acyclic Network (DAG) and the conditional probability distributions of all.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistics for Differential Expression Naomi Altman Oct. 06.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Multiple comparisons problem and solutions James M. Kilner
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
3.3. SIMPLE LINEAR REGRESSION: DUMMY VARIABLES 1 Design and Data Analysis in Psychology II Salvador Chacón Moscoso Susana Sanduvete Chaves.
Presenter: Zheng “Alex” Fu, Ph.D. LIAI, Bioinformatics Core
The binomial applied: absolute and relative risks, chi-square
Differential Gene Expression
Hypothesis testing using contrasts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
I. Statistical Tests: Why do we use them? What do they involve?
One Way ANOVAs One Way ANOVAs
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Presentation transcript:

Linear Models for Microarray Data LIMMA Linear Models for Microarray Data

Difficulties with microarray data Variability of the expression values differs between genes Non-identical and dependent distribution between genes Multiple testing of tens of thousands of genes

Correct for multiple comparisons Multiple testing - Family-wise error rate - False Discovery Rate etc. Parallel nature of the inference allows for compensating possibilities Borrowing information from the ensemble of genes to assist in inference from individual genes

Empirical Bayes Frequentist methods, a hypothesis is typically rejected or not rejected without directly assigning a probability Bayesian methods, specifies some prior probability, which is then updated in the light of new data. For Bayesian techniques, the prior distribution is assigned independent of the data and fixed before any data is observed.

Empirical Bayes Superficially similar to Bayesian methods in that a prior distribution is assigned. However, prior distribution is estimated from the data Therefore Empirical Bayes is a frequentist technique

LIMMA Empiricial Bayes techniques have previously been applied to microarray data Analysis specific to experiment and very difficult to implement LIMMA - Simple model with simple expression of posterior odds Allows linear modelling to be applied to microarray data

Estrogen Data 2x2 factorial experiment on MCF7 breast cancer cells using Affymetrix HGU95av2 arrays Factors : Estrogen (Presence/Absence) Length of exposure (10hr/48hr) The idea of the study is to identify genes that respond to estrogen treatment

Read in the Data Load in the estrogen data Normalise the data Define the targets (factors) for the linear model

Design Matrix Eight arrays Four pairs of replicates 1 low10-1.cel absent 10 2 low10-2.cel absent 10 3 high10-1.cel present 10 4 high10-2.cel present 10 5 low48-1.cel absent 48 6 low48-2.cel absent 48 7 high48-1.cel present 48 8 high48-2.cel present 48 Eight arrays Four pairs of replicates Four parameters in the linear model

Contrast Matrix Estrogen effect at 10 hours 1 low10-1.cel absent 10 2 low10-2.cel absent 10 3 high10-1.cel present 10 4 high10-2.cel present 10 5 low48-1.cel absent 48 6 low48-2.cel absent 48 7 high48-1.cel present 48 8 high48-2.cel present 48 Estrogen effect at 10 hours Time effect without estrogen Estrogen effect at 48 hours

Differential Expression Extract linear model fit for contrasts Obtain list of differentially expressed genes for contrasts Look for overlap among differentially expressed genes

Linear Model Fit logFC - Estimate of the log2-fold-change corresponding to the effect or contrast AveExpr - Average log2-expression for the probe over all arrays/channels t - moderated t-statistic P.Value - Raw p-value adj.P.Value -Adjusted p-value B - log odds that the gene is differentially expressed

Annotating Data Probe arrays can be annotated with external data Multiple sources of gene annotations

Gene Set Enrichment All biochemical pathways are determined by sets of genes Gene sets are determined by prior biological knowledge relating to co-expression, function, location or known biochemical pathways. If a pathway is in any way related to a biological trait then the co-functioning genes should display a higher degree of enrichment compared to the rest of the transcriptome. Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant overlap

Estrogen receptor (ER) gene set If estrogen is present, ER genes will bind the estrogen and become activated Gain ability to regulate gene expression and result in differential expression between the cells with and without estrogen Should lead to up regulation of ER genes