# Class 6: Hypothesis testing and confidence intervals

## Presentation on theme: "Class 6: Hypothesis testing and confidence intervals"— Presentation transcript:

Class 6: Hypothesis testing and confidence intervals
Data Analysis Class 6: Hypothesis testing and confidence intervals

What to ‘expect’? We have studied various distributions:
Bernoulli – binary Geometric – number of Bernoulli experiments to first success Binomial – number of successes in n Bernoulli experiments Gaussian – real-valued, bell-shaped curve Exponential – real-valued time to first event Poisson – number of events in a unit time interval Tells us what to expect from measurements What if we are not sure about the parameters? Can we use measurements as evidence? Fitting distributions to data Testing hypotheses based on data Confidence intervals

Remember… Type Distribution / density function Mean Variance Bernoulli
Geometric Binomial Gaussian Exponential Poisson

Properties of distributions
Mean Sample mean (average) Variance Sample variance

Moment matching Choosing parameters such that First, second, … order moments are equal to their empirical estimate

Bernoulli / geometric / binomial
Bernoulli first order moment: (Similar for geometric and binomial) Only one parameter  higher order moments not needed

Bernoulli / geometric / binomial
Empirical mean: p = 0.34, estimated based on 100*10 Bernoulli outcomes

Gaussian First and second order moments:
Only two parameters, so no need for higher moments

Gaussian Empirical means: 3.2 and 15.6 Empirical stds: 1.1 and 2.0

Multivariate Gaussian
Multivariate Gaussian density function: Parameters: Mean vector mu Covariance matrix Sigma

Multivariate Gaussian
Parameters can be estimated from a set of samples {x1,x2,…,xn}:

Exponential / Poisson Exponential mean: Poisson mean:
Both will give the same result (lambda = empirical number of events per unit of time)

Exponential / Poisson Significant plane crashes since 1 January 1998…

Exponential / Poisson Lambda = = 1/mean(time between crashes)

Exponential / Poisson Lambda = 1.5 = average number of crashes in 100 days (note: 100 larger, because unit time interval is x 100 too)

Hypothesis testing Given a distribution (and parameters)
E.g. binomial: number of faulty items in a lot of a factory pipeline Empirical data may urge us to revise our hypothesis

Binomial distribution
Consider a pipeline in a factory If the expected probability of a fault is (should be) 0.01, what can we conclude if we see a batch with 10% faults? p=0.01, n=100 (batch size), x=10 (number faulty) Probability to see something equally or more surprising = the p-value: P(X>=10)? Extremely small  we should reject the hypothesis that p=0.01!  the pipeline must be broken!?

Binomial distribution
In practice: This can be computed by the cumulative binomial distribution function In matlab (with p=0.01): 1-binocdf(9,100,p)

Poisson distribution Assume the expected number of plane crashes in 100 days is supposed to be 1.5 What can we conclude if there are 5 in a given 100 days? (It is true for the 16th unit time interval) The p-value = P(X>=5) P-value is small – should we reject the null hypothesis that lambda=1.5?

Poisson distribution In practice:
This can be computed by means of the cumulative Poisson distribution function In matlab (with lambda=1.5): 1-poisscdf(4,lambda)

Hypothesis testing In general: Assume a null hypothesis for the data
Faults are Bernoulli random variables with given p Crashes occur with a fixed probability lambda per unit time interval Gather data Compute a test statistic of the data Number of faults in a batch of n Number of crashes in a unit time interval Compute the p-value as the test statistic is equally large on random data from the null hypothesis If the p-value is smaller than a threshold (0.01, 0.05…), reject the null hypothesis

Hypothesis testing In general:
Hypothesis testing quantifies that a random variable will typically be close to its mean This holds more strongly as the standard deviation is smaller

Permutation testing Sometimes, the distribution of the test statistic can be too complex Then: permutation testing Generate random data sets by permutating the one sampled (1000 times) Compute the fraction of times the test statistic is larger in those permuted versions This is an approximation of the p-value (Assumption of this approach: permuted versions of the data are equally likely under the null hypothesis)

Permutation testing Test statistic = number of plane crashes in 16th unit time interval of 100 days

Permutation testing Generate 1000 random crash time series with the same number of crashes in the same period (e.g. by permuting the days) Compute the number of crashes in the 16th unit time (of 100 days) Compute the proportion of those 1000 permutations where the number of crashes in this interval was at least 5 This is the p-value estimate! Result (in my experiment): – very close to 0.02 as computed using Poisson

Confidence intervals Rather than computing a point estimate for the mean … … we can compute an interval for the mean A range of values in which the mean will be with high confidence

Confidence intervals Consider a pipeline in a factory
If the expected probability of a fault is (should be) 0.01, what can we conclude if we see a batch with 10% faults? n=100 (batch size), x=10 (number faulty) Let’s say: we reject the null hypothesis if p-value < delta=0.05 p=0.01  p-value = 7.6E-8 p=0.05  p-value = p=0.055  p-value = 0.05 p=0.1  p-value = 0.54 The set of all values for p for which the p-value >= 0.05 is the confidence interval with confidence delta=0.05: [0.055,1]

Confidence intervals Assume in a given unit time interval of 100 days, there are 5 crashes p-value threshold used: delta=0.01 lambda = 1  p-value = lambda = 1.28  p-value = 0.01 lambda = 2  p-value = lambda = 4  p-value = 0.37 Confidence interval with confidence delta=0.01: [1.28,infinity]

Confidence intervals This was one-sided Two-sided:
For all lambda values in the interval: P(at least 4 crashes)>= P(at most 4 crashes)>=0.005 Two-sided confidence interval with confidence delta=0.01: [1.08,12.6] Indeed: 1-poisscdf(4,1.08) = poisscdf(4,12.6) = 0.005

Confidence intervals Other (more common) interpretation of confidence intervals: With probability (over the sampled data) equal to the confidence parameter, the confidence interval will contain the actual value You can verify that this is the case… (think about it) [For any mean outside the interval, the probability of the observed test statistic (or more extreme) is less than delta. Hence, the probability over the data that the interval contains the actual mean is at least delta]

Lab session On the temperature time series data: On the Titanic data:
Compute the 12-dimensional mean temperature over the year Compute the covariance matrix Visualize both in the report (using plot and imagesc) On the Titanic data: Compute the probability of having died among first class passengers (report) Compute the p-value for the null hypothesis that the probability of having died for third class passengers is the same (report) Compute the probability of survival among all male passengers (report) Compute the p-value for the null hypothesis that the probability of survival for female passengers is the same (report) On the plane crash data: Make a histogram of number of plane crashes per time unit, starting on 1/1/1998, and fit a Poisson to it (as in the lecture), but with unit time interval equal to 50 days (report) Find the unit time interval with the largest number of crashes (report) Compute the p-value for this time interval both analytically using the Poisson cumulative distribution function as well as using permutation testing. What can you conclude, e.g. with p-value threshold equal to 0.01? (report)