Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Statistical Inference Mike Wasikowski June 12, 2008.

Similar presentations


Presentation on theme: "An Introduction to Statistical Inference Mike Wasikowski June 12, 2008."— Presentation transcript:

1 An Introduction to Statistical Inference Mike Wasikowski June 12, 2008

2 Statistics Up till now, we have looked at probability Analyzing data in which chance played some part of its development Two main branches Estimation of parameters Testing hypotheses about parameters To use statistical analysis, must ensure we have a random sample of the population Methods described are classical methods, "probability of data D given hypothesis H" Bayesian methods are also sometimes used, "probability of hypothesis H given data D"

3 Contents What is an estimator? Unbiased estimators Biased estimators Parametric hypothesis tests Nonparametric hypothesis tests Multiple tests/experiments

4 Classical Estimation Methods Probability distributions P X (x;θ) and density functions f X (x;θ) have parameters Can use the observed value x of X to estimate θ To estimate the parameters, must use multiple iid observations, x 1, x 2,..., x n Estimator of parameters θ is a function of the rv's X 1, X 2,..., X n, written as either θ(X 1, X 2,..., X n ) or θ Value of θ is the estimate of θ

5 Desirable Properties of Estimators Unbiased: E(θ) = θ Small variance: observed value of θ should be close to θ Normal distribution, either exactly or approximately: allows us to use the properties of the normal distribution to provide properties of θ

6 Estimating μ Use X to estimate μ Know the mean value of X is μ, so X is unbiased Know the variance of X is σ 2 /n, so it is small when n is large Central limit theorem tells us that the distribution of X will be approximately normal with a large number of observations Our estimated value of μ is x

7 Confidence Intervals From section 1.10.2, for large n, P(X- 2σ/sqrt(n) < μ < X+2σ/sqrt(n)) ~ 0.95 Probability of random interval (X- 2σ/sqrt(n), X+2σ/sqrt(n)) containing μ is approximately 95% Observed value of interval given all x i is (x-2σ/sqrt(n),x+2σ/sqrt(n))

8 Estimating σ 2 Can develop unbiased estimator of σ 2 by σ 2 = (Σ(X i -X) 2 )/(n-1) Our estimated value of σ 2 is s 2 = (Σ(x i - x) 2 )/(n-1) One potential problem: unless n is very large, this variance will also typically be large Variance of X = σ 2 /n = (Σ(X i -X) 2 )/(n(n-1))

9 Estimated Confidence Intervals We then have the 95% confidence interval for μ as (X-2S/sqrt(n), X+2S/sqrt(n)) Observed interval from data is (x- 2s/sqrt(n), x+2s/sqrt(n)) Again, warning: unless n is very large, this interval will be large and may not be useful

10 Binomial and Multinomial Probability Estimates Consider RV Y(p,n), where p is a parameter and n is the index Know the mean value of Y/n is p and variance of Y/n is p(1-p)/n By above, p = Y/n is an unbiased estimator of p Typical estimate of variance of p is p(1-p)/n = y(n-y)/n 3, where y = number of successes Above estimate is biased, unbiased estimate is y(n-y)/(n 3 - n) similarly to σ 2 estimate Estimate of {p i } is similarly calculated by converting multinomial problem into a series of binomial problems

11 Biased Estimators Not all estimators are unbiased Biased estimator θ is one where E(θ) differs from θ Bias = E(θ)- θ Assess accuracy of θ by MSE rather than variance MSE(θ) = E((θ - θ)^2) = Var(θ)+Bias(θ) 2 When E(θ) = θ+O(n -1 ), call the estimator asymptotically unbiased MSE and variance would differ by O(n -2 )

12 Why use biased estimators? Some parameters cannot be estimated in an unbiased manner Biased estimators are better than unbiased estimators if MSE < variance

13 Hypothesis Testing Test a null hypothesis (H 0 ) versus an alternate hypothesis (H 1 or H a ) Five steps: 1)Declare null hypothesis and alternate hypothesis 2)Select significance level α 3)Determine the test statistic to be used 4)Determine what observed values of test statistic would lead to rejection of H 0 5)Use data to determine whether observed value of test statistic meets or exceeds significance point from step 4

14 Declaring Hypotheses Must declare null hypothesis and alternate hypothesis before seeing any data to avoid bias Hypotheses can be simple (specifies all values of unknown parameters) or complex (does not specify all values of unknown parameters) Natural alternate hypothesis is complex Alternate hypotheses can be either one-sided (θ> θ 0 or θ < θ 0 ) or two-sided (θ != θ 0 )

15 Selecting Significance Level Two types of errors can be made from a hypothesis test Type I: reject H 0 when it is true Type II: fail to reject H 0 when it is false Unless we have limitless observations, cannot make the probability of making either error arbitrarily small Typical method is to focus on type I errors and fix α to be arbitrarily low Common values of α are 1% and 5%

16 Choosing Test Statistic There is much theory available for choosing good test statistics Chapter 9 (Alex) discusses finding the optimal test statistic that, for a given type I error rate, will minimize the rate of making type II errors for a number of observations

17 Finding Significance Points Find the value of the significance points K for the test statistic General ex: α = 0.05, P(type I error) = P(X >= K | H 0 ) = 0.05 If the RV is discrete, it may be impossible to find an exact value of K such that the rate of type I errors is exactly α In practice, we err conservatively and round up the value of K

18 Finding Conclusions Compare the result of the test statistic derived from observations to significance points K Two conclusions can be drawn from a hypothesis test: fail to reject null, or reject null in favor of alternate A hypothesis test never tells you if a hypothesis is true or false

19 P-values An equivalent method skips calculating significance point K Instead, calculate the achieved significance level (p-value) of the test statistic Then compare p-value to α If p-value <= α, reject H 0 If p-value > α, fail to reject H 0

20 Power of Hypothesis Tests Recall step 3 involves choosing an optimal test statistic If both hypotheses are simple, choice of α implicitly determines β, rate of type II error Power of hypothesis test = 1- β, rate of avoiding type II errors If we have a complex alternate hypothesis, probability of rejecting H 0 depends on actual value of parameters in test, so there is no unique value of beta Chapter 9 discusses how to find the power of tests with alternate hypotheses

21 Z-test Classic example: what is the mean of data drawn from a normal distribution? H 0 : μ = μ 0, H 1: μ > μ 0 Use X as our optimal test statistic RV Z = (X - μ 0 )sqrt(n)/σ has distribution N(0,1) when H 0 is true For α = 0.05, get Z ≥ 1.645 for significance level

22 One-sample t-test Must estimate the sample variance with s 2 Now use one-sample t-test, t = (x-μ 0 )sqrt(n)/s If we know that X_1, X_2,..., X_n are NID(mu,sigma^2), H 0 distribution is well known T = (X-μ 0 )sqrt(n)/S Called the t-distribution with n-1 degrees of freedom T is asymptotically equal to Z, differs greatly for small n

23 Two-sample t-test What if we need to compare between two different RV's? Ex: repeated experiment comparing two methods H 0 : μ 1 = μ 2, H 1 = μ 1 != μ 2 Consider X 11, X 12,..., X 1m ~ NID(μ 1, σ 2 ) and X 21, X 22,..., X 2n ~ NID(μ 2,σ 2 ) to be RV's from which our observations are drawn Use two-sample t-test Large positive or negative values cause rejection of H 0

24 Two-sample t-test T-distribution RV Observed value of RV

25 Paired t-test Suppose values of X 1i and X 2i are logically paired by some manner Can instead perform a paired t-test, use D i = X 1i -X 2i for our test H 0 : μ D = 0, H 1 : μ D != 0 Then use T = Dsqrt(n)/S D as our test statistic This method can eliminate sources of variance Beginnings of source for ANOVA, where we break variation into different components Also foundations for F-test, test of ratio between two variances

26 Chi-square test Consider a multinomial distribution H 0 : p i = specific value for each i=1..k, H 1 : at least one of p i != predefined value Use X 2 as our test statistic, X 2 = Σ(Y i -np i ) 2 /(np i ) Larger observed values of X 2 will lead to rejection of H 0 When H 0 is true and n is large, X 2 ~ chi-square distribution with k-1 degrees of freedom

27 Association tests Compare elements of a population by placing into one of a number of categories for two properties Fisher's exact test compares two different binary properties of a population H 0 : two properties are independent of one another, H 1 : two properties are dependent in some manner Can also use chi-square test on tables of arbitrary number of rows and columns

28 Hypothesis Testing with Maximum as Test Statistic Bioinformatics has several areas where maximum of many RV's is a useful test statistic BLAST, local alignment of sequences, only care about the most likely alignment Let X 1, X 2,..., X n ~ N(μ i,1) H 0 : μ i = 0 for all i, H 1 : one μ i > 0 with the rest μ i = 0 Optimal test statistic: X max Reject H 0 if P(X max > x max | H_0) < α Use equation 1-F(x max ) n to find P-value Some options still exist if we cannot calculate the cdf, one possibility is total variation distance

29 Nonparametric Tests Two-sample t-test is a distribution-dependent test, relies on RV's having the normal distribution If we use the t-test when at least one of the underlying RV's is not normal, using the calculated p-value will result in an invalid testing procedure Nonparametric, or distribution-free, tests avoid problems with using tests specific to a distribution

30 Permutation Tests Avoids assumption of normal distribution Have RV's X 11, X 12,..., X 1m iid and X 21, X 22,..., X 2n iid, with possibly differing distributions Assume that X 1i independent of X 2j for all (i,j) H 0 : X 1i 's distributed identically as X 2j 's, H 1 : distributions differ Q = nCr(m+n,m) possible placements of X 11, X 12,..., X 1m, X 21, X 22,..., X 2n into two groups, permutations H 0 says each Q has same probability of arising

31 Permutation Tests Calculate test statistic for each permutation Reject H 0 if observed value of statistic is among the most 100α% extreme values of the test statistic Choice of test statistic depends on what we think may be different about the two distributions t-tests could be used if we feel they have different means, F-test if different variances Problems with these tests: granularity with too few samples, computational complexity with too many

32 Mann-Whitney Test Frequently used alternative to two-sample t-test Observed values x 11, x 12,...,x 1m and x 21, x 22,..., x 2n are listed in increasing order Associate all observations with their rank in this list Sum of all ranks is (m+n)(m+n+1)/2 H 0 : X 1i 's, X 2j 's are identically distributed, H 1 : at least one parameter of the distributions differ For large sample sizes, use central limit theorem to test null hypothesis using z-score For small sample sizes, can calculate exact p-value as a permutation test

33 Wilcoxon Signed-rank Test Test for value of the median of a generic continuous RV; if distribution is symmetric, also tests for mean H0: med = M 0, H1: med != M 0 Calculate absolute differences |x i - M 0 |, order from smallest to largest, give ranks to each value Observed test statistic = sum of ranks of positive differences Use central limit theorem to compare groups with large number of samples Can also calculate exact p-value as permutation for small sample sizes

34 Multiple Associated Tests If we test many associated hypotheses where each H 0 is true, chance will lead to one or more being rejected Family-wide p-value can be used to avoid this result If we want a family-wide significance level of 0.05, each test should have α = 0.05/g, the number of different tests we are performing This correction applies even if the tests are not independent of one another, recall indicator variable discussion Obvious problem: if we perform multiple different tests, this procedure will result in a very low required p-value to reject H 0 for each individual test

35 Multiple Experiments In science, it is common to repeat tests to verify results What if the p-values of each test are close to α but not less? Use a combined p-value to show significance of each p-value in conjunction with others V = -2log(P 1 P 2...P k ) gives a quantity with a chi- square distribution of 2k degrees of freedom Can result in seeing significant results when no individual null hypothesis was rejected

36 Questions?


Download ppt "An Introduction to Statistical Inference Mike Wasikowski June 12, 2008."

Similar presentations


Ads by Google