Presentation is loading. Please wait.

Presentation is loading. Please wait.

Canadian Bioinformatics Workshops www.bioinformatics.ca.

Similar presentations


Presentation on theme: "Canadian Bioinformatics Workshops www.bioinformatics.ca."— Presentation transcript:

1 Canadian Bioinformatics Workshops www.bioinformatics.ca

2

3 Essential Statistics in Biology: Getting the Numbers Right Raphael Gottardo Clinical Research Institute of Montreal (IRCM) raphael.gottardo@ircm.qc.ca http://www.rglab.org

4 Day 1 4 Outline Exploratory Data Analysis 1-2 sample t-tests, multiple testing Clustering SVD/PCA Frequentists vs. Bayesians

5 PCA and SVD (Multivariate analysis)

6 Day 1 - Section 4 6 Outline What is SVD? Mathematical definition Relation to Principal Component Analysis (PCA) Applications of PCA and SVD Illustration with gene expression data

7 Day 1 - Section 4 7 SVD Let X be a matrix of size mxn (m≥n) and rank r≤n then we can decompose X as XVSU = xx T m n mn nnn n - U is the matrix of left singular vectors - V is the matrix of right singular vectors - S is a diagonal matrix who’s diagonal are the singular values

8 Day 1 - Section 4 8 SVD Let X be a matrix of size mxn (m≥n) and rank r≤n then we can decompose X as XVSU = xx T m n mn nnn n

9 Day 1 - Section 4 9 SVD Let X be a matrix of size mxn (m≥n) and rank r≤n then we can decompose X as XVSU = xx T m n mn nnn n Direction Amplitude

10 Day 1 - Section 4 10 Relation to PCA Assume that the rows of X are centered then is (up to a constant) the empirical covariance matrix and SVD is equivalent to PCA The rows of V are the singular vectors or principal components New variables Variance Gene expression: Eigengenes or eigenassays

11 Day 1 - Section 4 11 Applications of SVD and PCA Dimension reduction (simplify a dataset) Clustering Discriminant analysis Exploratory data analysis tool Find the most important signal in data 2D projections

12 Day 1 - Section 4 12 Toy example s=(13.47,1.45) set.seed(100) x1<-rnorm(100,0,1) y1<-rnorm(100,1,1) var0.5<-matrix(c(1,-.5,-.5,.1),2,2) data1<-t(var0.5%*%t(cbind(x1,y1))) set.seed(100) x2<-rnorm(100,2,1) y2<-rnorm(100,2,1) var0.5<-matrix(c(1,.5,.5,1),2,2) data2<-t(var0.5%*%t(cbind(x2,y2))) data<-rbind(data1,data2) svd1<-svd(data1) plot(data1,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd1$v[2,1]/svd1$v[1,1]),col=2) abline(coef=c(0,svd1$v[2,2]/svd1$v[1,2]),col=3)

13 Day 1 - Section 4 13 Toy example s=(47.79,13.25) svd2<-svd(data2) plot(data2,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd2$v[2,1]/svd2$v[1,1]),col=2) abline(coef=c(0,svd2$v[2,2]/svd2$v[1,2]),col=3) svd<-svd(data) plot(data,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd$v[2,1]/svd$v[1,1]),col=2) abline(coef=c(0,svd$v[2,2]/svd$v[1,2]),col=3)

14 Day 1 - Section 4 14 Toy example ### Projection data.proj<-svd$u%*%diag(svd$d) svd.proj<-svd(data.proj) plot(data.proj,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd.proj$v[2,1]/svd.proj$v[1,1]),col=2) ### svd.proj$v[1,2]=0 abline(v=0,col=3)

15 Day 1 - Section 4 15 Toy example s=(47.17,11.88) New coordinate s Projected data

16 Day 1 - Section 4 16 Toy example ### New data set.seed(100) x1<-rnorm(100,-1,1) y1<-rnorm(100,1,1) var0.5<-matrix(c(1,-.5,-.5,1),2,2) data1<-t(var0.5%*%t(cbind(x1,y1))) set.seed(100) x2<-rnorm(100,1,1) y2<-rnorm(100,1,1) var0.5<-matrix(c(1,.5,.5,1),2,2) data2<-t(var0.5%*%t(cbind(x2,y2))) data<-rbind(data1,data2) svd1<-svd(data1) plot(data1,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd1$v[2,1]/svd1$v[1,1]),col=2) abline(coef=c(0,svd1$v[2,2]/svd1$v[1,2]),col=3) svd2<-svd(data2) plot(data2,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd2$v[2,1]/svd2$v[1,1]),col=2) abline(coef=c(0,svd2$v[2,2]/svd2$v[1,2]),col=3) svd<-svd(data) plot(data,xlab="x",ylab="y",xlim=c(-6,6),ylim=c(- 6,6)) abline(coef=c(0,svd$v[2,1]/svd$v[1,1]),col=2) abline(coef=c(0,svd$v[2,2]/svd$v[1,2]),col=3)

17 Day 1 - Section 4 17 Toy example s=(26.48,24.98)

18 Day 1 - Section 4 18 Application to microarrays Dimension reduction (simplify a dataset) Clustering (two many samples) Discriminant analysis (find a group of genes) Exploratory data analysis tool Find the most important signal in data 2D projections (clusters?)

19 Day 1 - Section 4 19 Application to microarrays Cho cell cycle data set384 genes We have standardized the data cho.data<-as.matrix(read.table("logcho_237_4class.txt",skip=1)[,3:19])logcho_237_4class.txt cho.mean<-apply(cho.data,1,"mean")cho.sd<-apply(cho.data,1,"sd")cho.data.std<-(cho.data-cho.mean)/cho.sd svd.cho<-svd(cho.data.std) ### Contribution of each PC barplot(svd.cho$d/sum(svd.cho$d),col=heat.colors(17)) ### First three singular vectors (PCA) plot(svd.cho$v[,1],xlab="time",ylab="Expression profile",type="b") plot(svd.cho$v[,2],xlab="time",ylab="Expression profile",type="b") plot(svd.cho$v[,3],xlab="time",ylab="Expression profile",type="b") ### Projection plot(svd.cho$u[,1]*svd.cho$d[1],svd.cho$u[,2]*svd.cho$d[2],xlab="PCA 1 ",ylab="PCA 2") plot(svd.cho$u[,1]*svd.cho$d[1],svd.cho$u[,3]*svd.cho$d[3],xlab="PCA 1 ",ylab="PCA 3") plot(svd.cho$u[,2]*svd.cho$d[2],svd.cho$u[,3]*svd.cho$d[3],xlab="PCA 2 ",ylab="PCA 3") ### Select a cluster ind 5 & svd.cho$u[,2]*svd.cho$d[2]>0 & svd.cho$u[,3]*svd.cho$d[3]<0 plot(svd.cho$u[,2]*svd.cho$d[2],svd.cho$u[,3]*svd.cho$d[3],xlab="PCA 2 ",ylab="PCA 3") points(svd.cho$u[ind,2]*svd.cho$d[2],svd.cho$u[ind,3]*svd.cho$d[3],col=2) matplot(t(cho.data.std[ind,]),xlab="time",ylab="Expression profiles",type="l")

20 Day 1 - Section 4 20 Application to microarrays Singular values Relative contribution Why? Main contribution

21 Day 1 - Section 4 21 Application to microarrays PC1

22 Day 1 - Section 4 22 Application to microarrays PC2

23 Day 1 - Section 4 23 Application to microarrays PC3

24 Day 1 - Section 4 24 Application to microarrays Projection onto PC1 PC2

25 Day 1 - Section 4 25 Application to microarrays Projection onto PC1 PC3

26 Day 1 - Section 4 26 Application to microarrays Projection onto PC2 PC3

27 Day 1 - Section 4 27 Application to microarrays Projection onto PC2 PC3 24 genes

28 Day 1 - Section 4 28 Application to microarrays Projection onto PC2 PC3 24 genes

29 Day 1 - Section 4 29 Conclusion SVD is a powerful tool Can be very useful in gene expression data SVD of genes (eigen-genes) SVD of samples (eigen-assays) Mostly an EDA tool

30 Overview of Statistics inference: Bayes vs. Frequentists (If time permits)

31 Day 1 - Section 5 31 Introduction Parametric statistical model Observation are drawn from a probability distribution where is the parameter vector Likelihood function → (Inverted density)

32 Day 1 - Section 5 32 Introduction Parametric statistical model Observation are drawn from a probability distribution where is the parameter vector Likelihood function → (Inverted density)

33 Day 1 - Section 5 33 Introduction Normal distribution Probability distribution for one observation is If independence

34 Day 1 - Section 5 34 Introduction 15 observations N(1,1)

35 Day 1 - Section 5 35 Introduction 15 observations N(1,1) True probability distribution

36 Day 1 - Section 5 36 Inference The parameters are unknown “Learn” something about the parameter vector θ from the data Make inference about θ ‣ Estimate θ ‣ Confidence region ‣ Test an hypothesis (θ=0)

37 Day 1 - Section 5 37 The frequentist approach The parameters are fixed but unknown Inference is based on the relative frequency of occurrence when repeating the experiment For example, one can look at the variance of an estimator to evaluate its efficiency

38 Day 1 - Section 5 38 The Normal Example: Estimation Normal distribution is the mean andis the variance (Sample mean and sample variance) Numerical example, 15 obs. from N(1,1) Use the theory of repeated samples to evaluate the estimators.

39 Day 1 - Section 5 39 The Normal Example: Estimation In our toy example, the data are normal, and we can derive the sampling distribution of the estimators. For example we know that is normal with mean and variance. The standard deviation of an estimator is called the standard error. What if we can’t derive the sampling distribution? Use the bootstrap!

40 Day 1 - Section 5 40 The Bootstrap - Basic idea is to resample the data we have observed and compute a new value of the statistic/estimator for each resampled data set. - Then one can assess the estimator by looking at the empirical distribution across the resampled data sets. set.seed(100) x<-rnorm(15) mu.hat<-mean(x) sigma.hat<-sd(x) B<-100 mu.hatNew<-rep(0,B) for(i in 1:B) { x.new<-sample(x,replace=TRUE) mu.hatNew[i]<-mean(x.new) } se<-sd(mu.hatNew) set.seed(100) x<-rnorm(15) mu.hat<-mean(x) sigma.hat<-sd(x) B<-100 mu.hatNew<-rep(0,B) for(i in 1:B) { x.new<-sample(x,replace=TRUE) mu.hatNew[i]<-median(x.new) } se<-sd(mu.hatNew)

41 Day 1 - Section 5 41 The Normal Example: CI Confidence interval for the mean : depends on n but when n is large and usuallywhere Numerical example, 15 obs. from N(1,1) What does this mean? set.seed(100) x<-rnorm(15) t.test(x,mean=0) > set.seed(100) > x<-rnorm(15) > t.test(x,mean=0) One Sample t-test data: x t = 0.3487, df = 14, p-value = 0.7325 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -0.2294725 0.3185625 sample estimates: mean of x 0.044545

42 Day 1 - Section 5 42 The Normal Example:Testing Test an hypothesis about the mean: t-test If, t follows a t-distribution with n-1 degrees of freedom p-value

43 Day 1 - Section 5 43 The Bayesian Approach Parametric statistical model Observation are drawn from a probability distribution where is the parameter vector ● The parameters are unknown but random ● The uncertainty on the vector parameter is model through a prior distribution

44 Day 1 - Section 5 44 The Bayesian Approach A Bayesian statistical model is made of 1. A parametric statistical model 2. A prior distribution Q: How can we combine the two? A: Bayes Theorem!

45 Day 1 - Section 5 45 The Bayesian Approach Bayes theorem ↔ Inversion of probability If A and E are events such that P(E)≠0 and P(A)≠0 then P(A|E) and P(E|A) are related by

46 Day 1 - Section 5 46 The Bayesian Approach From prior to posterior: Information on θ contained in the observation y Prior information Normalizing constant

47 Day 1 - Section 5 47 The Bayesian Approach Sequential nature of Bayes’ theorem: The posterior is the new prior!

48 Day 1 - Section 5 48 The Bayesian Approach Actualization of the information about θ by extracting the information about θ from the data Condition upon the observations (Likelihood principle) Avoids averaging over the unobserved values of y Provide a complete unified inferential scope Justifications:

49 Day 1 - Section 5 49 The Bayesian Approach Calculation of the normalizing constant can be difficult Conjugate priors (exact calculation is possible) Markov chain Monte Carlo Practical aspect:

50 Day 1 - Section 5 50 The Bayesian Approach Conjugate priors: Example: and +→ Normal mean, one observation

51 Day 1 - Section 5 51 The Bayesian Approach Conjugate priors: Example: and +→ Normal mean, n observations Shrinkage

52 Day 1 - Section 5 52 Introduction 15 observations N(1,1) Standardized likelihood

53 Day 1 - Section 5 53 Introduction 15 observations N(1,1) Standardized likelihood Prior

54 Day 1 - Section 5 54 Introduction 15 observations N(1,1) Standardized likelihood Prior Posterior

55 Day 1 - Section 5 55 Introduction 15 observations N(1,1) Standardized likelihood Prior

56 Day 1 - Section 5 56 Introduction 15 observations N(1,1) Standardized likelihood Prior Posterior

57 Day 1 - Section 5 57 The Bayesian Approach Many! Subjectivity of the prior (most critical) The prior distribution is the key to Bayesian inference Criticism of the Bayesian choice:

58 Day 1 - Section 5 58 The Bayesian Approach Prior information is (almost) always available There is no such things as a prior distribution The prior is a tool summarizing available information as well as uncertainty related with this information The use of your prior is ok as long as you can justify it Response:

59 Day 1 - Section 5 59 The Bayesian Approach Make the best of available prior information Unified framework The prior information can be used to regularize noisy estimates (few replicates) Computationally demanding? Bayesian statistics and Bioinformatics


Download ppt "Canadian Bioinformatics Workshops www.bioinformatics.ca."

Similar presentations


Ads by Google