Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Inference

Similar presentations


Presentation on theme: "Introduction to Inference"— Presentation transcript:

1 Introduction to Inference

2 What is the goal in statistics?
to infer from the sample data a conclusion about the population.

3 Statistical Inference
Provides methods for drawing conclusions about a population from sample data. So what’s new about this? We can use probability to express the strength of our conclusions.

4 Confidence Intervals Have the form estimate ± margin of error
Our guess for the value of the unknown parameter Shows how accurate we believe our guess is (also denoted as m)

5 Confidence Interval A level C confidence interval for a parameter has two parts: An interval calculated from the data, usually of the form estimate ± margin of error A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples.

6 10.1 Estimating with Confidence
Example 10.2, p. 537 Suppose you want to estimate the mean SAT Math score for the more than 350,000 high school seniors in California. Only about 49% of California students take the SAT. These self-selected seniors are planning to attend college and so are not representative of all California seniors. You know better than to make inferences about the population based on any sample data. At considerable effort and expense, you give the test to a simple random sample (SRS) of 500 California high school seniors. The mean for your sample is 𝑥 =461. What can you say about the mean score µ in the population of all 350,000 seniors?

7 Some things to notice: 𝑥 =461
Suppose you want to estimate the mean SAT Math score for the more than 350,000 high school seniors in California. Only about 49% of California students take the SAT. These self-selected seniors are planning to attend college and so are not representative of all California seniors. You know better than to make inferences about the population based on any sample data. At considerable effort and expense, you give the test to a simple random sample (SRS) of 500 California high school seniors. The mean for your sample is 𝑥 =461. What can you say about the mean score µ in the population of all 350,000 seniors? Some things to notice: We are not provided the population mean or standard deviation 𝑛=500 𝑥 =461  𝑛≥30 Follows CLT; approximately normal

8 ≈4.5 𝑛=500 𝑥 =461 Let us suppose that 𝜎=100. Then, 𝜎 𝑥 = 100 500
Suppose you want to estimate the mean SAT Math score for the more than 350,000 high school seniors in California. Only about 49% of California students take the SAT. These self-selected seniors are planning to attend college and so are not representative of all California seniors. You know better than to make inferences about the population based on any sample data. At considerable effort and expense, you give the test to a simple random sample (SRS) of 500 California high school seniors. The mean for your sample is 𝑥 =461. What can you say about the mean score µ in the population of all 350,000 seniors? 𝑛= 𝑥 =461 Let us suppose that 𝜎=100. Then, 𝜎 𝑥 = We’ll learn how to deal with not being given the standard deviation later. ≈4.5

9 Suppose you want to estimate the mean SAT Math score for the more than 350,000 high school seniors in California. Only about 49% of California students take the SAT. These self-selected seniors are planning to attend college and so are not representative of all California seniors. You know better than to make inferences about the population based on any sample data. At considerable effort and expense, you give the test to a simple random sample (SRS) of 500 California high school seniors. The mean for your sample is 𝑥 =461. What can you say about the mean score µ in the population of all 350,000 seniors? 𝑛=500 𝑥 =461 𝜎=100 𝜎 𝑥 ≈4.5 What if we were to collect data on more SRSs of 500?

10 Suppose you want to estimate the mean SAT Math score for the more than 350,000 high school seniors in California. Only about 49% of California students take the SAT. These self-selected seniors are planning to attend college and so are not representative of all California seniors. You know better than to make inferences about the population based on any sample data. At considerable effort and expense, you give the test to a simple random sample (SRS) of 500 California high school seniors. The mean for your sample is 𝑥 =461. What can you say about the mean score µ in the population of all 350,000 seniors? 𝑛=500 𝑥 =461 𝜎=100 𝜎 𝑥 ≈4.5 What if we were to collect data on more SRSs of 500?

11 Recall the Empirical Rule:
68% is within 1 standard deviation of the mean 95% is within 2 standard deviations of the mean 99.7% is within 3 standard deviations of the mean

12 Statistical Confidence
𝑛= 𝑥 = 𝜎= 𝜎 𝑥 ≈4.5 So in 95% of the samples, the unknown µ lies between 𝑥 +2 𝜎 𝑥 and 𝑥 −2 𝜎 𝑥 . 𝑥 +2 𝜎 𝑥 = 𝑥 −2 𝜎 𝑥 = 𝑥 = 𝑥 +9 𝑥 −2 4.5 = 𝑥 −9

13 𝑛=500 𝑥 =461 𝜎=100 𝜎 𝑥 ≈4.5 What does this mean?
𝑛= 𝑥 = 𝜎= 𝜎 𝑥 ≈4.5 What does this mean? In 95% of all samples, 𝑥 lies within ±9 of the unknown population mean µ. So µ also lies within ±9 of 𝑥 in those samples.

14 95% Confidence 𝑥 −9= 461−9= 452 𝑥 +9= 461+9= 470
Our sample of 500 California seniors gave 𝑥 =461. We say that we are 95% confident that the unknown mean SAT Math score for all California high school seniors lies between 𝑥 −9= 461−9= 452 𝑥 +9= 461+9= 470

15 Understanding the Grounds for Confidence
𝑥 −9= 461−9= 𝑥 +9= 461+9= 470 Two possibilities: The interval between452 and 470 contains the true µ. Our SRS was one of the few samples for which 𝑥 is not within 9 points of the true µ Only 5% of all samples give such inaccurate results.

16 Writing in context (Because you should!)
We are 95% confident that the unknown µ lies between 452 and 470. This is called a 95% Confidence Interval. “We got these numbers by a method that gives correct results 95% of the time”

17 Homework P. 542: 10.1, 10.3, 10.4 a – d Due: Tuesday


Download ppt "Introduction to Inference"

Similar presentations


Ads by Google