For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know  ?!

Inference for the Mean of a Population To estimate , we use a confidence interval around x. The confidence interval is built with , which we replace with s (the sample std. dev.) if  is not known.

t-distributions The “standard error” of x. For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom. (see Table D)

t-distributions t-distributions with k (=n-1) degrees of freedom – are labeled t(k), – are symmetric around 0, – and are bell-shaped – … but have more variability than Normal distributions, due to the substitution of s in the place of .

Example: Estimating the level of vitamin C Data: 2631232211221431 Find a 95% confidence interval for . A: (, ) Write it as “estimate plus margin of error” STATA Exercise 1

STATA Exercise 2

STATA Exercises 3 and 4

Paired, unpaired tests “Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero. Ho: mean(pretest - posttest) = mean(diff) = 0 STATA Exercise 5

STATA Exercise 6

Robustness of t procedures t-tests are only appropriate for testing a hypothesis on a single mean in these cases: – If n<15: only if the data is Normally distributed (with no outliers or strong skewness) – If n≥15: only if there are no outliers or strong skewness – If n≥40: even if clearly skewed (because of the Central Limit Theorem)

Comparing Two Means

Suppose we make a change to the registration procedure. Does this reduce the number of mistakes? Basically, we’re looking at two populations: – the before-change population (population 1) – the after-change population (population 2) Is the mean number of mistakes (per student) different? Is  1 –  2 = 0 or  0?

Comparing Two Means Notice that we are not matching pairs. We compare two groups.

Comparing Two Means PopulationVariableMean Standard Deviation 1x1x1 11 11 2x2x2 22 22

Comparing Two Means Population Sample Size Sample Mean Sample Standard Deviation 1n1n1 x1x1 s1s1 2n2n2 x2x2 s2s2

Comparing Two Means The population, really, is every single student using each registration procedure, an infinite number of times. – Suppose we get a “good” result today: how do we know it will be repeated tomorrow? We can’t repeat the procedure an infinite number of times, we only have a “sample”: numbers from one year. We estimate (  1 –  2 ) with (x 1 – x 2 ).

Comparing Two Means Remember is a Random Variable. To estimate  we need both and the margin of error around, which is So we need to know, or rather, the appropriate standard error for this estimation. Because we are estimating a difference, we need the standard error of a difference.

 =0 Comparing Two Means If the standard error for is Then the standard error for (x 1 – x 2 ) is

STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents. Two-sample significance test

STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.

STATA Exercise 7

Paired, unpaired tests “Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero. Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test whether the difference of the means is zero. Ho: mean(pretest) - mean(posttest) = diff = 0 STATA Exercise 5

STATA Exercise 8 ttest ego, by(group) unequal

Robustness and Small Samples Two-sample methods are more robust than one-sample methods. – More so if the two samples have similar shapes and sample sizes. STATA assumes that the variances are the same (what the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option. Small samples, as always, make the test less robust.

Pooled two-sample t procedures

Suppose the two Normal population distributions have the same standard deviation. Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution.

Pooled two-sample t procedures The common, but unknown standard deviation of both populations is . The sample standard deviations s 1 and s 2 estimate . The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights:

(assuming  is the same for both populations) Here, t* is the value for the t(n 1 + n 2 – 2) density curve with area C between – t* and t*. To test the hypothesis H o :  1 =  2, compute the pooled two-sample t statistic And use P-values from the t(n 1 + n 2 – 2) distribution. THE POOLED TWO-SAMPLE T PROCEDURES ttest ego, by(group)

For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know  ?!

Similar presentations

Presentation on theme: "For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know  ?!"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know  ?!

Similar presentations

Presentation on theme: "For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know  ?!"— Presentation transcript:

Similar presentations

About project

Feedback