Presentation is loading. Please wait.

Presentation is loading. Please wait.

This Week Review of estimation and hypothesis testing

Similar presentations


Presentation on theme: "This Week Review of estimation and hypothesis testing"— Presentation transcript:

1 This Week Review of estimation and hypothesis testing
Reading Le (review) Chapter 4: Sections 4.1 – 4.3 Chapter 5: Sections 5:1 and 5:4 Chapter 7: Sections 7:1 – 7.3 Reading C &S Chapter 2:A-E Chapter 6: A,B,F

2 Point Estimate Population Parameter Point Estimate m Sample mean p
Sample proportion r Sample correlation m1 - m2 Difference between 2 sample means p1 - p2 Difference between 2 sample proportions s Sample standard deviation Sampling error: True value – estimate (unknown)

3 Statistical Inference
Population with mean m = ? A simple random sample of n elements is selected from the population. The value of is used to make inferences about the value of m. The sample data provide a value for the sample mean .

4 Interval Estimation In general, confidence intervals are of the form:
Estimate = mean, proportion, regression coefficient, odds ratio... SE = standard error of your estimate 1.96 = for 95% CI based on normal distribution

5 Standard normal distribution
2.5% probability 2.5% probability

6 Estimation for Population Mean m
Point estimate: Estimate of variability in population A slightly larger number based on the t-distribution is used for smaller n Estimate of variability in point estimate (SE) 95% Confidence Interval

7 Assumptions Data in population follows a normal distribution or
Sample size is large enough to apply central limit theorem (CLT) CLT – no matter the shape of the population distribution of the sample mean approaches a normal distribution as the sample size gets large

8 Meaning of Confidence Interval
There is a 95% chance that your interval contains m. (That you “captured” the true value m with your interval)

9 Example Suppose sample of n=100 persons mean = 215 mg/dL, standard deviation = 20 95% CI = Lower Limit: 215 – 1.96*20/10 Upper Limit: *20/10 = (211, 219) “We are about 95% confident that the interval contains m” We can pretty much rule out that m > 220

10 Properties of Confidence Intervals
As sample size increases, CI gets smaller Because SE gets smaller; Can use different levels of confidence 90, 95, 99% common More confidence means larger interval; so a 90% CI is smaller than a 99% CI What would a 100% CI look like? Changes with population standard deviation More variable population means larger interval

11 Effect of sample size Suppose we had only 10 observations
What happens to the confidence interval? For n = 100, For n = 10, Larger sample size = smaller interval

12 Effect of confidence level
Suppose we use a 90% interval What happens to the confidence interval? 90%: Lower confidence level = smaller interval (A 99% interval would use 2.58 as multiplier and the interval would be larger)

13 Effect of standard deviation
Suppose we had a SD of 40 (instead of 20) What happens to the confidence interval? More variation = larger interval

14 Effect of different sample
Suppose new sample with mean of 212 (but same standard deviation) What happens to the confidence interval? Same size, moves a little

15 How Big A Sample To Take? Depends on the variability in the population
Depends on how precise an estimate you want Cost - if it doesn’t cost much to sample an element then sample many

16 95% Confidence Intervals for m Using SAS
PROC MEANS DATA = datasetname CLM ; VAR list of variables This will display the following statistics N Mean Standard Deviation Standard Error of Mean Lower 95% Confidence Limit Upper 95% Confidence Limit Confidence Limits

17 Assessing Normality with Graphs
Boxplots and stem-and-leaf plots, histograms Look for skewness (non-symmetry) Hard to get normal looking graphs with small sample sizes Can check effect of transformations Normal probability plots x-axis: related to inverse of standard normal distribution y-axis: actual data * actual data + what we would expect if data were really normal

18 Assessing normality - PROC UNIVARIATE
PROC UNIVARIATE DATA = demo NORMAL PLOT; VAR ursod; * Ursod is urinary sodium excretion in 8-hours RUN; NORMAL and PLOT are two options that test for normality and display simple graphs Plots are best - with enough data, tests for normality almost always reject normality assumption

19 STEM AND LEAF PLOT Stem Leaf # Boxplot | | | | | | + | *-----* | | Multiply Stem.Leaf by 10**+1

20 The UNIVARIATE Procedure
Variable: ursod Normal Probability Plot * | * | * * | *** +++ | * +++ * +++ | *++ | * *** | *** | ** ****** | ***** | ******** 15+* * ** ** +++

21 Log transformed value shows a better linear pattern
Variable: lursod Normal Probability Plot * | *++ | **++ | **++ | ** + * ++ | *++ | *+ | *** | ** ** | * | ** | *** | *** ** | ** | * | **** | ** **+ | *+ | | **+** | * + 2.65+* ++ Log transformed value shows a better linear pattern

22 Hypothesis Testing Hypothesis: A statement about parameters of population or of a model (m=200 ?) Test: Does the data agree with the hypothesis? (sample mean 220) Measure the agreement with probability

23 Steps in hypothesis testing
State null and alternative hypothesis (Ho and Ha) Ho usually a statement of no effect or no difference between groups Choose α level Probability of falsely rejecting Ho (Type I error)

24 Steps in hypothesis testing
Calculate test statistic, find p-value (p) Measures how far data are from what you expect under null hypothesis State conclusion: p < α, reject Ho p > α, insufficient evidence to reject Ho

25 Possible results of tests
What we decide Reality

26 Details α related to confidence level Commonly set at 0.05 or 0.01
β usually predetermined by sample size

27 One sample t-test; test for population mean
Simple random sample from a normal population (or n large enough for CLT) Ho: μ = μo Ha : μ  μo , pick α test statistic:

28 Matched pairs data d = X2 - X1 Ho: d = 0, Ha: d  0
Recall independence requirement for CIs Similar issue for t-tests Observations not independent Examples; pre and post test, left and right eyes, brother-sister pairs Solution: look at paired differences, do one sample test on differences d = X2 - X1 Ho: d = 0, Ha: d  0

29 PROC TTEST, one sample test
PROC TTEST DATA = DEMO; VAR age; RUN; Tests if mean age is different than zero. Not very useful Need to be tricky...

30 Use a Data step to calculate a new variable
Subtract value of mean under null hypothesis Test new variable for difference from zero DATA DEMO; SET DEMO; dage = age - 25; RUN; PROC TTEST DATA=DEMO ; VAR dage; This tests whether the mean age is different from 25

31 PROC TTEST one sample output
T-Tests Variable DF t Value Pr > |t| dage Conclusion: We have insufficient evidence to claim that the mean age is different than 25 (p=0.69)


Download ppt "This Week Review of estimation and hypothesis testing"

Similar presentations


Ads by Google