Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.

Slides:



Advertisements
Similar presentations
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Advertisements

Is it statistically significant?
Dealing With Statistical Uncertainty
Inferential Statistics & Hypothesis Testing
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
Dealing With Statistical Uncertainty
Topic 2: Statistical Concepts and Market Returns
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Dealing With Statistical Uncertainty
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Inference about a Mean Part II
Experimental Evaluation
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Student’s t statistic Use Test for equality of two means
BCOR 1020 Business Statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Chapter 15 Nonparametric Statistics
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Statistical Inference for Two Samples
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Statistical Inference: Estimation and Hypothesis Testing chapter.
AM Recitation 2/10/11.
Confidence Intervals and Hypothesis Testing - II
Hypothesis Testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
Hypothesis testing – mean differences between populations
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
NONPARAMETRIC STATISTICS
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Essential Statistics Chapter 131 Introduction to Inference.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Nonparametric Statistics
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
MATB344 Applied Statistics I. Experimental Designs for Small Samples II. Statistical Tests of Significance III. Small Sample Test Statistics Chapter 10.
Midterm. T/F (a) False—step function (b) False, F n (x)~Bin(n,F(x)) so Inverting and estimating the standard error we see that a factor of n -1/2 is missing.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Two-Sample Hypothesis Testing
Chapter 9 Hypothesis Testing.
Presentation transcript:

Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics

Synopsis Hypothesis testing P-values Confidence intervals T-test Non-parametric tests Permutations Bootstrap Contingency Tables

Hypothesis testing Null Hypothesis H 0 Examples of H 0 : – Mean of a population is 3.0 – In a genetic association study, there is no association between disease state and the genotypes of a particular SNP Alternative Hypothesis H 1 Examples of H 1 : – Mean is 4.0 – Mean > 3.0 – Mean != 3.0 – There is an association between disease and genotype

The Normal Distribution X ~ N( ,   ) Mean  Variance  2 Density function – exp(-(y-       Many quantitative traits have Normal Distributions – Height – Weight Central Limit Theorem – average of many random quantities usually has a Normal Distribution even if the individual observations do not – Mean(X) ~ N( ,   /n)

P-values The P-value of a statistic Z is the probability of sampling a value more extreme than that observed value z if the null hypothesis is true. For a one-sided test, the p-value would be  = Prob( Z > z | H 0 ) For a two-sided test it is  = Prob( |Z| > z | H 0 ) Note this is not the probability density at x, but the area of the tails Upper and lower 2.5% tails of the N(0,1) distribution, corresponding to |z|=1.96

Power The power of a statistic Z is the probability of sampling a value more extreme than the observed value z if the alternative hypothesis is true. For a one-sided test, the power would be  = Prob( Z > z | H 1 ) For a two-sided test it is  = Prob( |Z| > z | H 1 ) Note this is not the probability density at x, but area in the tails H 0 : N(0,1) H 1 : N(1.5,1) H 2 : N(4,1) 0.05 = Prob( Z> 1.644|H 0 ) (5% upper tail) 0.44 = Prob( Z> 1.644|H 1 ) 0.99 = Prob( Z> 1.644|H 2 )

The Likelihood Likelihood = “probability of the data given the model” Basis of parametric statistical inference Different hypotheses can often be expressed in terms of different values of the parameters  Often use log of the likelihood Normal distribution likelihood,  : Hypothesis testing equivalent to comparing the likelihoods for different 

The Likelihood Ratio Test General Framework for constructing hypothesis tests: S = Likelihood( data | H 1 ) / Likelihood( data | H 0 ) Reject H 0 if S > s(H 0, H 1 ) Threshold s is chosen such that there is a probability  of making a false rejection under H 0.  is the size of the test (or false positive rate or Type I error) e.g.  =0.05  is the power of the test, the probability of correctly rejecting H 0 when H 1 is true e.g.  =  is the type II error, the false negative rate Generally, for fixed sample size n, if we fix  then we can’t fix  If we fix  and  then we must let n vary. The Neyman Pearson Lemma states that the likelihood ratio test is the most powerful of all tests of a given size Type III error: "correctly rejecting the null hypothesis for the wrong reason".

Example: The mean of a Normal Distribution H 0 : mean =  0 vs H 1 : mean =  1 data: y 1, … y n independently and identically distributed with a Normal distribution N( ,  2 ) Assume variance is the same and is known Therefore we base all inferences on the sample mean compared to the difference in the hypothesised means Note the distribution of the sample mean is N( ,  2 /n)

The Normal and T distributions For a sample from a Normal distribution with known mean and variance, the sample mean can be standardised to follow the standard Normal distribution And the 95% confidence interval for the mean is given by But we often wish to make inferences about samples from a Normal Distribution when the variance is unknown: The variance must be estimated from the data as the sample variance Because of this uncertainty the distribution of the sample mean is broader, and follows the T n-1 distribution

The T distribution The T n and Normal distributions are almost identical for n>20, As a rule of thumb, an approximate 95% confidence interval for n>20 is

T-tests The T-test compares the means of samples The T-test is optimal (in Likelihood-ratio sense) if the data are sampled from a Normal distribution Even if the data are not Normal the test is useful in large samples There are several versions, depending on the details

T tests for the comparison of sample means One Sample Test One sample y 1 ….. y n – H 0 : the sample mean =  0 – H 1 : the sample mean !=  0 – Reject H 0 if – t n-1 (  ) is the quantile of the T n-1 distribution such that the probability of exceeding t n-1 (  ) is  – Two sided test – in R, t n-1 (0.025) = qt(0.025,n-1)

T tests for the comparison of sample means Paired T-test Two samples of paired data (x 1,y 1 ), (x 2,y 2 ) …. (x n,y n ) Example: – two experimental replicates taken from n individuals, we wish to test if the replicates are statistically similar H 0 : mean(x) = mean(y) Take differences d 1 = x 1 - y 1 … d n = x n - y n H 0 : mean(d) = 0 One sample T-test with  0 =0

T tests for the comparison of sample means Two-Sample T-test Two samples of unpaired data x 1, x 2 ….. x n y 1, y 2 ….. y m H 0 : mean(x) = mean(y) If we assume the variance is the same in each group then we can estimate the variance as the pooled estimator The test statistic is (The case with unequal variances is more complicated)

T tests in R t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,...) x - a numeric vector of data values. y - an optional numeric vector data values. alternative - a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". mu - a number indicating the true value of the mean paired - a logical indicating whether you want a paired t-test. var.equal - a logical variable indicating whether to treat the two variances as being equal. conf.level - confidence level of the interval. formula - a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups. data - an optional data frame containing the variables in the model formula. subset - an optional vector specifying a subset of observations to be used.

T tests in R samp1 <- rnorm( 20, 0, 1) # sample 20 numbers from N(0,1) distribution samp2 <- rnorm( 20, 1.5, 1) # sample from N(1.5,1) t.test( samp1, samp2,var.equal=TRUE ) Two Sample t-test data: samp1 and samp2 t = , df = 38, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

Outliers A small number of contaminating observations with different distribution Outliers can destroy statistical genuine significance Reason: the estimate of the variance is inflated, reducing the T statistic And sometimes create false significance : Type III error: "correctly rejecting the null hypothesis for the wrong reason". There are Robust Alternatives to the T-test: Wilcoxon tests > samp3 <- samp2 > samp3[1] = 40 # add one outlier > t.test( samp1, samp3,var.equal=TRUE ) Two Sample t-test data: samp1 and samp3 t = , df = 38, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y > var(samp2) [1] > var(samp3) [1]

Non-parametric tests

Non-parametric tests: Wilcoxon signed rank test (alternative to Paired T-test) Two samples of paired data (x 1,y 1 ), (x 2 -y 2 ) …. (x n,y n ) Take differences d 1 = x 1 - y 1 … d n = x n – y n H 0 : distribution of d’s is symmetric about 0 Compute the ranks of the absolute differences, rank(|d 1 |) etc Compute the signs of the differences Compute W, the sum of the ranks with positive signs If H 0 true then W should be close to the mean rank n/2 The distribution of W is known and for n>20 a Normal approximation is used. in R wilcox.test(x, y, alternative = c("two.sided", "less", "greater"), mu = 0, paired = TRUE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95,...)

Wilcoxon rank sum test (alternative to unpaired T-test) Two samples of unpaired data x 1, x 2 ….. x n y 1, y 2 ….. y m Arrange all the observations into a single ranked series of length N = n+m. R 1 = sum of the ranks for the observations which came from sample 1. – The sum of ranks in sample 2 follows by calculation, since the sum of all the ranks equals N(N + 1)/2 Compute U = min(R 1,R 2 ) - n(n + 1)/2. z = (U –  )/  Normal approximation  = nm/2   = nm(n=m+1)/12 The Wilcoxon Rank sum test is only about 5% less efficient than the unpaired T-test (in terms of the sample size required to achieve the same power) in large samples.

Wilcoxon rank sum test (alternative to unpaired T-test) >wilcox.test(samp1,samp2) Wilcoxon rank sum test data: samp1 and samp2 W = 82, p-value = alternative hypothesis: true mu is not equal to 0 > wilcox.test(samp1,samp3) # with outliers Wilcoxon rank sum test data: samp1 and samp3 W = 82, p-value = alternative hypothesis: true mu is not equal to 0

Binomial Sign Test (T-test on one sample) One sample y 1 ….. y n H 0 : the sample mean =  0 Count M, the number of y’s that are >  0 If H 0 is true, this should be about n/2, and should be distributed as a Binomial random variable B(1/2,n)

The Binomial Distribution B(n,p) Probability distribution of getting r successes in n independent trials Discrete Distribution (takes values 0,1,….,n) p = probability of success in one trial coin tossing with a fair coin: p=0.5 P(r|n,p) = p r (1-p) n-r n!/r!(n-r)! n!/r!(n-r)! is the number of ways of choosing r objects from a set of n p r (1-p) n-r is the probability that in a particular choice, there are exactly r successes Mean = p Variance = p(1-p) Normal Approximation for the proportion of successes r/n: – valid in limit for large n – r/n ~ N( p, p(1-p)/n )

Permutation Tests Another non-parametric Alternative Permutation is a general principle with many applications to hypothesis testing. But it is not useful for parameter estimation The method is best understood by an example: – Suppose a data set is classified into N groups. We are interested in whether the differences between group means (eg if N=2 then the two sample T test or Wilcoxon Rank Sum test is appropriate) – Compute a statistic S (eg the difference between the maximum and minimum group mean) – If there are no differences between groups then any permutation of the group labellings between individuals should produce much the same result for S – Repeat i = 1 to M times: permute the data compute the statistic on the permuted data, S i – The Permutation p-value is the fraction of permutations that exceed S It’s that simple!

Bootstrapping Given a set of n things S = {x 1 …. x 2 }, a bootstrap sample is formed by sampling with replacement from S, to create a new set S* of size n Original Resample Some elements of S will be missing in S* and some will be present multiply often (about 1/e = 63% of the elements will be present in S* on average) S* can be used as a replacement for S in any algorithm that works on S, and we can repeatedly draw further bootstrap samples. Any statistic Z that can be calculated on S can also be calculated on S*, and so we can construct the bootstrap sampling distribution of Z from repeated samples. From this we can make inferences about the variation of Z, eg construct a 95% confidence interval (if Z is numeric) Can be used for parameter estimation

Bootstrapping Bootstrapping is a general way to evaluate the uncertainty in estimates of statistics whilst making few assumptions about the data It is particularly useful for complex statistics whose distribution cannot be evaluated analytically Example: – In a gene expression study, we wish to find a minimal set of SNPs that in combination will predict the variation in a phenotype. – Suppose we have a deterministic algorithm A that will find us a plausible set of SNPs (ie a model). This is likely to be a complex programme and contain some arbitrary steps. It is also unlikely to give any measure of uncertainty to the model. – There may be many possible sets of SNPs with similar claims, so we would like a way to average across different models, and evaluate the evidence for a SNP as the fraction of models in which the SNP occurs To do this, we need a way to perturb the data, to generate new data sets on which we can apply the procedure A Bootstrapping is one way to do this

Contingency Tables SmokerNon-smoker Male10 20 Female Do Males and Females have different rates of smoking?

Contingency Tables Data are a table of counts, with the row and column margins fixed. H o : the counts in each cell are consistent with the rows and columns acting independently n 11 r 1 c 1 /N n 12 r 1 c 2 /N r 1 = n 11 + n 12 n 21 r 2 c 1 /N n 22 r 2 c 2 /N r 2 = n 21 + n 22 c 1 = n 11 + n 21 c 2 = n 12 + n 22 N

The Chi-Squared statistics

Fisher’s Exact Test for 2 x 2 Contingency tables Similar idea to a permutation test (see below) Given the marginal totals in a 2x2 table, we can compute the probability P of getting the observed cell counts assuming the null hypothesis of no associations between classifying factors. – hypergeometric distribution. Find all tables with same margins and with probabilities no bigger than P P-value of the table is the sum of these probabilities

Fisher’s Exact Test SmokerNon-smoker Maleaba+b Femalecdc+d a+cb+dn R implementation: m <- matrix(c(10,10,5,12),nrow=2) fisher.test(m) Fisher's Exact Test for Count Data data: m p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio > SmokerNon-smoker Male10 20 Female