This Week Review of estimation and hypothesis testing

This Week Review of estimation and hypothesis testing
Reading Le (review) Chapter 4: Sections 4.1 – 4.3 Chapter 5: Sections 5:1 and 5:4 Chapter 7: Sections 7:1 – 7.3 Reading C &S Chapter 2:A-E Chapter 6: A,B,F

Point Estimate Population Parameter Point Estimate m Sample mean p
Sample proportion r Sample correlation m1 - m2 Difference between 2 sample means p1 - p2 Difference between 2 sample proportions s Sample standard deviation Sampling error: True value – estimate (unknown)

Statistical Inference
Population with mean m = ? A simple random sample of n elements is selected from the population. The value of is used to make inferences about the value of m. The sample data provide a value for the sample mean .

Interval Estimation In general, confidence intervals are of the form:
Estimate = mean, proportion, regression coefficient, odds ratio... SE = standard error of your estimate 1.96 = for 95% CI based on normal distribution

Standard normal distribution
2.5% probability 2.5% probability

Estimation for Population Mean m
Point estimate: Estimate of variability in population A slightly larger number based on the t-distribution is used for smaller n Estimate of variability in point estimate (SE) 95% Confidence Interval

Assumptions Data in population follows a normal distribution or
Sample size is large enough to apply central limit theorem (CLT) CLT – no matter the shape of the population distribution of the sample mean approaches a normal distribution as the sample size gets large

Meaning of Confidence Interval
There is a 95% chance that your interval contains m. (That you “captured” the true value m with your interval)

Example Suppose sample of n=100 persons mean = 215 mg/dL, standard deviation = 20 95% CI = Lower Limit: 215 – 1.96*20/10 Upper Limit: *20/10 = (211, 219) “We are about 95% confident that the interval contains m” We can pretty much rule out that m > 220

Properties of Confidence Intervals
As sample size increases, CI gets smaller Because SE gets smaller; Can use different levels of confidence 90, 95, 99% common More confidence means larger interval; so a 90% CI is smaller than a 99% CI What would a 100% CI look like? Changes with population standard deviation More variable population means larger interval

Effect of sample size Suppose we had only 10 observations
What happens to the confidence interval? For n = 100, For n = 10, Larger sample size = smaller interval

Effect of confidence level
Suppose we use a 90% interval What happens to the confidence interval? 90%: Lower confidence level = smaller interval (A 99% interval would use 2.58 as multiplier and the interval would be larger)

Effect of standard deviation
Suppose we had a SD of 40 (instead of 20) What happens to the confidence interval? More variation = larger interval

Effect of different sample
Suppose new sample with mean of 212 (but same standard deviation) What happens to the confidence interval? Same size, moves a little

How Big A Sample To Take? Depends on the variability in the population
Depends on how precise an estimate you want Cost - if it doesn’t cost much to sample an element then sample many

95% Confidence Intervals for m Using SAS
PROC MEANS DATA = datasetname CLM ; VAR list of variables This will display the following statistics N Mean Standard Deviation Standard Error of Mean Lower 95% Confidence Limit Upper 95% Confidence Limit Confidence Limits

Assessing Normality with Graphs
Boxplots and stem-and-leaf plots, histograms Look for skewness (non-symmetry) Hard to get normal looking graphs with small sample sizes Can check effect of transformations Normal probability plots x-axis: related to inverse of standard normal distribution y-axis: actual data * actual data + what we would expect if data were really normal

Assessing normality - PROC UNIVARIATE
PROC UNIVARIATE DATA = demo NORMAL PLOT; VAR ursod; * Ursod is urinary sodium excretion in 8-hours RUN; NORMAL and PLOT are two options that test for normality and display simple graphs Plots are best - with enough data, tests for normality almost always reject normality assumption

STEM AND LEAF PLOT Stem Leaf # Boxplot | | | | | | + | *-----* | | Multiply Stem.Leaf by 10**+1

The UNIVARIATE Procedure
Variable: ursod Normal Probability Plot * | * | * * | *** +++ | * +++ * +++ | *++ | * *** | *** | ** ****** | ***** | ******** 15+* * ** ** +++

Log transformed value shows a better linear pattern
Variable: lursod Normal Probability Plot * | *++ | **++ | **++ | ** + * ++ | *++ | *+ | *** | ** ** | * | ** | *** | *** ** | ** | * | **** | ** **+ | *+ | | **+** | * + 2.65+* ++ Log transformed value shows a better linear pattern

Hypothesis Testing Hypothesis: A statement about parameters of population or of a model (m=200 ?) Test: Does the data agree with the hypothesis? (sample mean 220) Measure the agreement with probability

Steps in hypothesis testing
State null and alternative hypothesis (Ho and Ha) Ho usually a statement of no effect or no difference between groups Choose α level Probability of falsely rejecting Ho (Type I error)

Steps in hypothesis testing
Calculate test statistic, find p-value (p) Measures how far data are from what you expect under null hypothesis State conclusion: p < α, reject Ho p > α, insufficient evidence to reject Ho

Possible results of tests
What we decide Reality

Details α related to confidence level Commonly set at 0.05 or 0.01
β usually predetermined by sample size

One sample t-test; test for population mean
Simple random sample from a normal population (or n large enough for CLT) Ho: μ = μo Ha : μ  μo , pick α test statistic:

Matched pairs data d = X2 - X1 Ho: d = 0, Ha: d  0
Recall independence requirement for CIs Similar issue for t-tests Observations not independent Examples; pre and post test, left and right eyes, brother-sister pairs Solution: look at paired differences, do one sample test on differences d = X2 - X1 Ho: d = 0, Ha: d  0

PROC TTEST, one sample test
PROC TTEST DATA = DEMO; VAR age; RUN; Tests if mean age is different than zero. Not very useful Need to be tricky...

Use a Data step to calculate a new variable
Subtract value of mean under null hypothesis Test new variable for difference from zero DATA DEMO; SET DEMO; dage = age - 25; RUN; PROC TTEST DATA=DEMO ; VAR dage; This tests whether the mean age is different from 25

PROC TTEST one sample output
T-Tests Variable DF t Value Pr > |t| dage Conclusion: We have insufficient evidence to claim that the mean age is different than 25 (p=0.69)

This Week Review of estimation and hypothesis testing

Similar presentations

Presentation on theme: "This Week Review of estimation and hypothesis testing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

This Week Review of estimation and hypothesis testing

Similar presentations

Presentation on theme: "This Week Review of estimation and hypothesis testing"— Presentation transcript:

Similar presentations

About project

Feedback