Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Hypothesis testing and confidence intervals by resampling by J. Kárász.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Sampling Distributions (§ )
Lecture 3 Probability and Measurement Error, Part 2.
Essential Statistics in Biology: Getting the Numbers Right Raphael Gottardo Clinical Research Institute of Montreal (IRCM)
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
BHS Methods in Behavioral Sciences I
Tuesday, October 22 Interval estimation. Independent samples t-test for the difference between two means. Matched samples t-test.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Lecture II-2: Probability Review
Standard error of estimate & Confidence interval.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Chapter 2 Dimensionality Reduction. Linear Methods
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Essential Statistics in Biology: Getting the Numbers Right
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Multinomial Distribution
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Ledolter & Hogg: Applied Statistics Section 6.2: Other Inferences in One-Factor Experiments (ANOVA, continued) 1.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Section 6.4 Inferences for Variances. Chi-square probability densities.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Applied statistics Usman Roshan.
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Inference: Conclusion with Confidence
Statistics in Applied Science and Technology
Simulation: Sensitivity, Bootstrap, and Power
More about Posterior Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Sampling Distributions (§ )
CS639: Data Management for Data Science
Applied Statistics and Probability for Engineers
Presentation transcript:

Canadian Bioinformatics Workshops

Essential Statistics in Biology: Getting the Numbers Right Raphael Gottardo Clinical Research Institute of Montreal (IRCM)

Day 1 4 Outline Exploratory Data Analysis 1-2 sample t-tests, multiple testing Clustering SVD/PCA Frequentists vs. Bayesians

PCA and SVD (Multivariate analysis)

Day 1 - Section 4 6 Outline What is SVD? Mathematical definition Relation to Principal Component Analysis (PCA) Applications of PCA and SVD Illustration with gene expression data

Day 1 - Section 4 7 SVD Let X be a matrix of size mxn (m≥n) and rank r≤n then we can decompose X as XVSU = xx T m n mn nnn n - U is the matrix of left singular vectors - V is the matrix of right singular vectors - S is a diagonal matrix who’s diagonal are the singular values

Day 1 - Section 4 8 SVD Let X be a matrix of size mxn (m≥n) and rank r≤n then we can decompose X as XVSU = xx T m n mn nnn n

Day 1 - Section 4 9 SVD Let X be a matrix of size mxn (m≥n) and rank r≤n then we can decompose X as XVSU = xx T m n mn nnn n Direction Amplitude

Day 1 - Section 4 10 Relation to PCA Assume that the rows of X are centered then is (up to a constant) the empirical covariance matrix and SVD is equivalent to PCA The rows of V are the singular vectors or principal components New variables Variance Gene expression: Eigengenes or eigenassays

Day 1 - Section 4 11 Applications of SVD and PCA Dimension reduction (simplify a dataset) Clustering Discriminant analysis Exploratory data analysis tool Find the most important signal in data 2D projections

Day 1 - Section 4 12 Toy example s=(13.47,1.45) set.seed(100) x1<-rnorm(100,0,1) y1<-rnorm(100,1,1) var0.5<-matrix(c(1,-.5,-.5,.1),2,2) data1<-t(var0.5%*%t(cbind(x1,y1))) set.seed(100) x2<-rnorm(100,2,1) y2<-rnorm(100,2,1) var0.5<-matrix(c(1,.5,.5,1),2,2) data2<-t(var0.5%*%t(cbind(x2,y2))) data<-rbind(data1,data2) svd1<-svd(data1) plot(data1,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd1$v[2,1]/svd1$v[1,1]),col=2) abline(coef=c(0,svd1$v[2,2]/svd1$v[1,2]),col=3)

Day 1 - Section 4 13 Toy example s=(47.79,13.25) svd2<-svd(data2) plot(data2,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd2$v[2,1]/svd2$v[1,1]),col=2) abline(coef=c(0,svd2$v[2,2]/svd2$v[1,2]),col=3) svd<-svd(data) plot(data,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd$v[2,1]/svd$v[1,1]),col=2) abline(coef=c(0,svd$v[2,2]/svd$v[1,2]),col=3)

Day 1 - Section 4 14 Toy example ### Projection data.proj<-svd$u%*%diag(svd$d) svd.proj<-svd(data.proj) plot(data.proj,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd.proj$v[2,1]/svd.proj$v[1,1]),col=2) ### svd.proj$v[1,2]=0 abline(v=0,col=3)

Day 1 - Section 4 15 Toy example s=(47.17,11.88) New coordinate s Projected data

Day 1 - Section 4 16 Toy example ### New data set.seed(100) x1<-rnorm(100,-1,1) y1<-rnorm(100,1,1) var0.5<-matrix(c(1,-.5,-.5,1),2,2) data1<-t(var0.5%*%t(cbind(x1,y1))) set.seed(100) x2<-rnorm(100,1,1) y2<-rnorm(100,1,1) var0.5<-matrix(c(1,.5,.5,1),2,2) data2<-t(var0.5%*%t(cbind(x2,y2))) data<-rbind(data1,data2) svd1<-svd(data1) plot(data1,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd1$v[2,1]/svd1$v[1,1]),col=2) abline(coef=c(0,svd1$v[2,2]/svd1$v[1,2]),col=3) svd2<-svd(data2) plot(data2,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd2$v[2,1]/svd2$v[1,1]),col=2) abline(coef=c(0,svd2$v[2,2]/svd2$v[1,2]),col=3) svd<-svd(data) plot(data,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd$v[2,1]/svd$v[1,1]),col=2) abline(coef=c(0,svd$v[2,2]/svd$v[1,2]),col=3)

Day 1 - Section 4 17 Toy example s=(26.48,24.98)

Day 1 - Section 4 18 Application to microarrays Dimension reduction (simplify a dataset) Clustering (two many samples) Discriminant analysis (find a group of genes) Exploratory data analysis tool Find the most important signal in data 2D projections (clusters?)

Day 1 - Section 4 19 Application to microarrays Cho cell cycle data set384 genes We have standardized the data cho.data<-as.matrix(read.table("logcho_237_4class.txt",skip=1)[,3:19])logcho_237_4class.txt cho.mean<-apply(cho.data,1,"mean")cho.sd<-apply(cho.data,1,"sd")cho.data.std<-(cho.data-cho.mean)/cho.sd svd.cho<-svd(cho.data.std) ### Contribution of each PC barplot(svd.cho$d/sum(svd.cho$d),col=heat.colors(17)) ### First three singular vectors (PCA) plot(svd.cho$v[,1],xlab="time",ylab="Expression profile",type="b") plot(svd.cho$v[,2],xlab="time",ylab="Expression profile",type="b") plot(svd.cho$v[,3],xlab="time",ylab="Expression profile",type="b") ### Projection plot(svd.cho$u[,1]*svd.cho$d[1],svd.cho$u[,2]*svd.cho$d[2],xlab="PCA 1 ",ylab="PCA 2") plot(svd.cho$u[,1]*svd.cho$d[1],svd.cho$u[,3]*svd.cho$d[3],xlab="PCA 1 ",ylab="PCA 3") plot(svd.cho$u[,2]*svd.cho$d[2],svd.cho$u[,3]*svd.cho$d[3],xlab="PCA 2 ",ylab="PCA 3") ### Select a cluster ind 5 & svd.cho$u[,2]*svd.cho$d[2]>0 & svd.cho$u[,3]*svd.cho$d[3]<0 plot(svd.cho$u[,2]*svd.cho$d[2],svd.cho$u[,3]*svd.cho$d[3],xlab="PCA 2 ",ylab="PCA 3") points(svd.cho$u[ind,2]*svd.cho$d[2],svd.cho$u[ind,3]*svd.cho$d[3],col=2) matplot(t(cho.data.std[ind,]),xlab="time",ylab="Expression profiles",type="l")

Day 1 - Section 4 20 Application to microarrays Singular values Relative contribution Why? Main contribution

Day 1 - Section 4 21 Application to microarrays PC1

Day 1 - Section 4 22 Application to microarrays PC2

Day 1 - Section 4 23 Application to microarrays PC3

Day 1 - Section 4 24 Application to microarrays Projection onto PC1 PC2

Day 1 - Section 4 25 Application to microarrays Projection onto PC1 PC3

Day 1 - Section 4 26 Application to microarrays Projection onto PC2 PC3

Day 1 - Section 4 27 Application to microarrays Projection onto PC2 PC3 24 genes

Day 1 - Section 4 28 Application to microarrays Projection onto PC2 PC3 24 genes

Day 1 - Section 4 29 Conclusion SVD is a powerful tool Can be very useful in gene expression data SVD of genes (eigen-genes) SVD of samples (eigen-assays) Mostly an EDA tool

Overview of Statistics inference: Bayes vs. Frequentists (If time permits)

Day 1 - Section 5 31 Introduction Parametric statistical model Observation are drawn from a probability distribution where is the parameter vector Likelihood function → (Inverted density)

Day 1 - Section 5 32 Introduction Parametric statistical model Observation are drawn from a probability distribution where is the parameter vector Likelihood function → (Inverted density)

Day 1 - Section 5 33 Introduction Normal distribution Probability distribution for one observation is If independence

Day 1 - Section 5 34 Introduction 15 observations N(1,1)

Day 1 - Section 5 35 Introduction 15 observations N(1,1) True probability distribution

Day 1 - Section 5 36 Inference The parameters are unknown “Learn” something about the parameter vector θ from the data Make inference about θ ‣ Estimate θ ‣ Confidence region ‣ Test an hypothesis (θ=0)

Day 1 - Section 5 37 The frequentist approach The parameters are fixed but unknown Inference is based on the relative frequency of occurrence when repeating the experiment For example, one can look at the variance of an estimator to evaluate its efficiency

Day 1 - Section 5 38 The Normal Example: Estimation Normal distribution is the mean andis the variance (Sample mean and sample variance) Numerical example, 15 obs. from N(1,1) Use the theory of repeated samples to evaluate the estimators.

Day 1 - Section 5 39 The Normal Example: Estimation In our toy example, the data are normal, and we can derive the sampling distribution of the estimators. For example we know that is normal with mean and variance. The standard deviation of an estimator is called the standard error. What if we can’t derive the sampling distribution? Use the bootstrap!

Day 1 - Section 5 40 The Bootstrap - Basic idea is to resample the data we have observed and compute a new value of the statistic/estimator for each resampled data set. - Then one can assess the estimator by looking at the empirical distribution across the resampled data sets. set.seed(100) x<-rnorm(15) mu.hat<-mean(x) sigma.hat<-sd(x) B<-100 mu.hatNew<-rep(0,B) for(i in 1:B) { x.new<-sample(x,replace=TRUE) mu.hatNew[i]<-mean(x.new) } se<-sd(mu.hatNew) set.seed(100) x<-rnorm(15) mu.hat<-mean(x) sigma.hat<-sd(x) B<-100 mu.hatNew<-rep(0,B) for(i in 1:B) { x.new<-sample(x,replace=TRUE) mu.hatNew[i]<-median(x.new) } se<-sd(mu.hatNew)

Day 1 - Section 5 41 The Normal Example: CI Confidence interval for the mean : depends on n but when n is large and usuallywhere Numerical example, 15 obs. from N(1,1) What does this mean? set.seed(100) x<-rnorm(15) t.test(x,mean=0) > set.seed(100) > x<-rnorm(15) > t.test(x,mean=0) One Sample t-test data: x t = , df = 14, p-value = alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates: mean of x

Day 1 - Section 5 42 The Normal Example:Testing Test an hypothesis about the mean: t-test If, t follows a t-distribution with n-1 degrees of freedom p-value

Day 1 - Section 5 43 The Bayesian Approach Parametric statistical model Observation are drawn from a probability distribution where is the parameter vector ● The parameters are unknown but random ● The uncertainty on the vector parameter is model through a prior distribution

Day 1 - Section 5 44 The Bayesian Approach A Bayesian statistical model is made of 1. A parametric statistical model 2. A prior distribution Q: How can we combine the two? A: Bayes Theorem!

Day 1 - Section 5 45 The Bayesian Approach Bayes theorem ↔ Inversion of probability If A and E are events such that P(E)≠0 and P(A)≠0 then P(A|E) and P(E|A) are related by

Day 1 - Section 5 46 The Bayesian Approach From prior to posterior: Information on θ contained in the observation y Prior information Normalizing constant

Day 1 - Section 5 47 The Bayesian Approach Sequential nature of Bayes’ theorem: The posterior is the new prior!

Day 1 - Section 5 48 The Bayesian Approach Actualization of the information about θ by extracting the information about θ from the data Condition upon the observations (Likelihood principle) Avoids averaging over the unobserved values of y Provide a complete unified inferential scope Justifications:

Day 1 - Section 5 49 The Bayesian Approach Calculation of the normalizing constant can be difficult Conjugate priors (exact calculation is possible) Markov chain Monte Carlo Practical aspect:

Day 1 - Section 5 50 The Bayesian Approach Conjugate priors: Example: and +→ Normal mean, one observation

Day 1 - Section 5 51 The Bayesian Approach Conjugate priors: Example: and +→ Normal mean, n observations Shrinkage

Day 1 - Section 5 52 Introduction 15 observations N(1,1) Standardized likelihood

Day 1 - Section 5 53 Introduction 15 observations N(1,1) Standardized likelihood Prior

Day 1 - Section 5 54 Introduction 15 observations N(1,1) Standardized likelihood Prior Posterior

Day 1 - Section 5 55 Introduction 15 observations N(1,1) Standardized likelihood Prior

Day 1 - Section 5 56 Introduction 15 observations N(1,1) Standardized likelihood Prior Posterior

Day 1 - Section 5 57 The Bayesian Approach Many! Subjectivity of the prior (most critical) The prior distribution is the key to Bayesian inference Criticism of the Bayesian choice:

Day 1 - Section 5 58 The Bayesian Approach Prior information is (almost) always available There is no such things as a prior distribution The prior is a tool summarizing available information as well as uncertainty related with this information The use of your prior is ok as long as you can justify it Response:

Day 1 - Section 5 59 The Bayesian Approach Make the best of available prior information Unified framework The prior information can be used to regularize noisy estimates (few replicates) Computationally demanding? Bayesian statistics and Bioinformatics