Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1)
Hypothesis Testing: Intervals and Tests
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Describing Data: One Variable
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Describing Data: One Quantitative Variable
Monday, 4/29/02, Slide #1 MA 102 Statistical Controversies Monday, 4/29/02 Today: CLOSING CEREMONIES!  Discuss HW #3  Review for final exam  Evaluations.
1 STAT 6020 Introduction to Biostatistics Fall 2005 Dr. G. H. Rowell Class 1.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Statistics: Unlocking the Power of Data Lock 5 1 in 8 women (12.5%) of women get breast cancer, so P(breast cancer if female) = in 800 (0.125%)
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Describing distributions with numbers
STAT 250 Dr. Kari Lock Morgan
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
Confidence Intervals: Bootstrap Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two.
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
CHAPTER 8 Estimating with Confidence
Estimation: Sampling Distribution
Chapter 3 Descriptive Measures
Quantitative Skills 1: Graphing
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Confidence Intervals: The Basics
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
1 Chapter 4: Describing Distributions 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
1 Results from Lab 0 Guessed values are biased towards the high side. Judgment sample means are biased toward the high side and are more variable.
Warm-up 2.3 Measures of Center and Spread NMinQ1MedianQ3Max No Caffeine Low Caffeine High Caffeine a) Describe the.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Describing Data: Two Variables
CONFIDENCE INTERVALS: THE BASICS Unit 8 Lesson 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistics: Unlocking the Power of Data Lock 5 Section 3.1 Sampling Distributions.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Activity: Car Correlations Consumer Reports’ data from a sample of n=109 car models We’ll explore the following associations: (a) Weight vs. City MPG (b)
Describing Data: Two Variables
Synthesis and Review for Exam 1
MATH-138 Elementary Statistics
Review 1. Describing variables.
Confidence Intervals: Sampling Distribution
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Homework: Frequency & Histogram worksheet
Treat everyone with sincerity,
Section 7.7 Introduction to Inference
Probability and Statistics
Section 7.1 Day 3.
Sampling Distributions
Presentation transcript:

Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence interval Section 3.1 Professor Kari Lock Morgan Duke University

Correlation Guessing Game Highest scorer in the class gets one extra point on the first exam!

Correlation r = 0.43 NFL Teams

Correlation Same plot, but with Dolphins and Raiders (outliers) removed r = 0.08

Human Cannonball Y X Plot Y vs. X What is the correlation between X and Y? (a) r > 0 (b) r < 0 (c) r = 0 Are X and Y associated? (a) Yes (b) No

Correlation Cautions 1.Correlation can be heavily affected by outliers. Always plot your data! 2. r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! 3.Correlation does not imply causation!

Summary: Two Quantitative Variables Summary Statistics – Correlation Visualization – Scatterplot

Variable(s)VisualizationSummary Statistics Categoricalbar chart, pie chart frequency table, relative frequency table, proportion Quantitativedotplot, histogram, boxplot mean, median, max, min, standard deviation, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart, mosaic plot two-way table, difference in proportions Quantitative vs Categorical side-by-side boxplotsstatistics by group Quantitative vs Quantitative scatterplotcorrelation

The Big Picture Population Sample Sampling Statistical Inference

Parameter vs Statistic A sample statistic is a number computed from sample data. A population parameter is a number that describes some aspect of a population We usually have a sample statistic and want to make inferences about the population parameter

The Big Picture Population Sample Sampling Statistical Inference PARAMETERS STATISTICS

Parameter vs Statistic mu sigma rho beta

Obama’s Approval Rating Gallup surveyed 1500 Americans between Jan 28-30, 2012, and 46% of these people approve of the job Barack Obama is doing as president What do you think is the true proportion of Americans who approve of the job Barack Obama is doing as president?

Point and Interval Estimates The sample statistic gives a point estimate of the population parameter (a single number) Usually, it is more useful to provide an interval estimate which gives a range of plausible values for the population parameter: How do we determine the margin of error???

Obama

Obama’s Approval Rating Between 43% and 49% of Americans currently approve of the job Obama is doing as president

IMPORTANT POINTS Sample statistics vary from sample to sample. (they will not match the parameter exactly) KEY QUESTION: For a given sample statistic, what are plausible values for the population parameter? How much uncertainty surrounds the sample statistic? KEY ANSWER: It depends on how much the statistic varies from sample to sample!

Reese’s Pieces What proportion of Reese’s pieces are orange? Take a random sample of 10 Reese’s pieces What is your sample proportion?  class dotplot Give a range of plausible values for the population proportion

Sampling Distribution A sampling distribution is the distribution of statistics computed for different samples of the same size taken from the same population The sampling distribution shows us how the statistic varies from sample to sample We can use the spread of the sampling distribution to determine the margin of error for a statistic

Sampling Distribution In the Reese’s pieces sampling distribution, what does each dot represent? a)One Reese’s piece b)One sample statistic

Sampling Distribution The higher the standard deviation of the sampling distribution, the (a) higher (b) lower the margin of error

Sample Size For a larger sample size you get less variability in the statistics, so less uncertainty in your estimate n = 10 n = 50 n = 100

Sampling Distribution A sampling distribution is the distribution of statistics computed for different samples of the same size taken from the same population The sampling distribution shows us how the statistic varies from sample to sample This gives us an idea for the uncertainty surrounding the estimate of a parameter

Random Samples If you take random samples, the sampling distribution will be centered around the true population parameter If sampling bias exists (if you do not take random samples), your sampling distribution may give you bad information about the true parameter

Lincoln’s Gettysburg Address

Confidence Interval A confidence interval for a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples The success rate (the proportion of all samples whose intervals contain the parameter) is known as the confidence level A 95% confidence interval will contain the true parameter for 95% of all samples

Confidence Intervals The parameter is fixed The statistic is random (depends on the sample) The interval is random (depends on the sample) Parameter Sampling Distribution

If you had access to the sampling distribution, how would you find the margin of error to ensure that intervals of the form would capture the parameter for 95% of all samples? Sampling Distribution

The standard error (SE) of a statistic is the standard deviation of the sample statistic A 95% confidence interval can be created by Standard Error

Economy A recent survey of 1,502 Americans in January 2012 found that 86% consider the economy a “top priority” for the president and congress this year. The standard error for this statistic is What is the 95% confidence interval for the true proportion of all Americans that consider the economy a “top priority” for the president and congress this year? (a) (0.85, 0.87) (b) (0.84, 0.88) (c) (0.82, 0.90) slipping/

Summary To create a plausible range of values for a parameter: Take many random samples from the population, and compute the sample statistic for each sample Compute the standard error as the standard deviation of all these statistics Use statistic  2  SE One small problem…

Reality … WE ONLY HAVE ONE SAMPLE!!!! How do we know how much sample statistics vary, if we only have one sample?!? … to be continued

Project 1 Pose a question that you would like to investigate. If possible, choose something related to your major! Find or collect data that will help you answer this question (you may need to edit your question based on available data) – If using existing data, you have to find your own (do not use a dataset already used in this class) – If collecting data, wait until your proposal has been approved to collect the data You can choose either a single variable or a relationship between two variables

Project 1 The result will be a five page paper including – Description of the data collection method, and the implications this has for statistical inference – Descriptive statistics (summary stats, visualization) – Confidence intervals – Hypothesis testing (following week) – Distribution-based inference (after Exam 1) Proposal due 2/15 – Can submit earlier if want feedback sooner – Include data if you are using existing data – If collecting your own data, proposal should include a detailed data collection plan

To Do Homework 2 (due Monday) Idea and data for Project 1 (proposal due 2/15)

FINDING DATA Joel Herndon