1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.

Slides:



Advertisements
Similar presentations
Chapter 16 Inferential Statistics
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
AP Statistics: Section 10.1 A Confidence interval Basics.
A Sampling Distribution
Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008.
1 Difference Between the Means of Two Populations.
Zen and the Art of Significance Testing At the center of it all: the sampling distribution The task: learn something about an unobserved population on.
1 More Regression Information. 2 3 On the previous slide I have an Excel regression output. The example is the pizza sales we saw before. The first thing.
An Inference Procedure
1 Confidence Interval for the Population Proportion.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
The Basics of Regression continued
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
1 Business 90: Business Statistics Professor David Mease Sec 03, T R 7:30-8:45AM BBC 204 Lecture 22 = More of Chapter “Confidence Interval Estimation”
1 More about the Confidence Interval of the Population Mean.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
1 Confidence Interval for the Population Mean. 2 What a way to start a section of notes – but anyway. Imagine you are at the ground level in front of.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
Need to know in order to do the normal dist problems How to calculate Z How to read a probability from the table, knowing Z **** how to convert table values.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Chapter 12 Section 1 Inference for Linear Regression.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Review of normal distribution. Exercise Solution.
Standard Error and Research Methods
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
A Sampling Distribution
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Statistics 101 Chapter 10. Section 10-1 We want to infer from the sample data some conclusion about a wider population that the sample represents. Inferential.
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Section 10.1 Confidence Intervals
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
LSSG Black Belt Training Estimation: Central Limit Theorem and Confidence Intervals.
Chapter 10: Introduction to Statistical Inference.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Chapter 8 Parameter Estimates and Hypothesis Testing.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Confidence Interval Estimation For statistical inference in decision making:
Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Confidence Interval Estimation For statistical inference in decision making: Chapter 9.
 Here’s the formula for a CI for p: p-hat is our unbiased Estimate of p. Z* is called the critical value. I’ll teach you how to calculate that next. This.
 Normal Curves  The family of normal curves  The rule of  The Central Limit Theorem  Confidence Intervals  Around a Mean  Around a Proportion.
Introduction to Inference
INF397C Introduction to Research in Information Studies Spring, Day 12
Statistics in Applied Science and Technology
Interval Estimation Download this presentation.
How Confident Are You?.
Presentation transcript:

1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.

2 Overview In the section I want to: 1) Review the basic idea of inferential statistics, 2) Present the elementary information need to understand regression techniques. In another file I will show you how you can get Microsoft Excel to give you the numbers you need to evaluate relationships between variables. As an example of relationships between variables we might think about how education influences income.

3 Normal distribution mean value As a start we can think about the normal distribution. Along the horizontal axis we measure the variable we think has a normal distribution. The variable might be age, income or whatever. Note the mean value is in the center of the distribution.

4 Normal distribution mean value The curve above the axis helps us understand what the probability of a range of values would have. As an example, the probability of having a value above the mean is 50%. 50% is the area under the curve to the right of the mean. The z table would help us find the probability of other ranges of values.

5 Example We could imagine that the people in a typical classroom represent a population. The population would be the people who meet in the class on a regular basis. As we think of this population, we might want to know about characteristics of the population such as age, income, or educational attainment. If we looked at the population we would call the population mean and standard deviation of a variable(of say, age) parameters of the population.

6 example When we look at the people in the class we could find out the population mean by asking everyone to give their age and then we could calculate the mean. But in many statistical studies we do not collect information from everyone. We only take a sample. The sample will have a mean and standard deviation as well. Since a sample does not include everyone in the population, the sample mean (and sample standard deviation) will have a value that depends on which people made it into the sample.

7 example Let’s take a sample of 5 people in the class and determine the average age. We have for an average of If we took a different sample of 5 we would have for an average of So in principle we could look at every possible sample of size 5 and calculate the mean for each sample. The mean for each sample of size five could then be looked at as a distribution.

8 sampling distribution When we think about repeated sampling, statistics like the mean from the sample could be thought of as a making up a sampling distribution. Due to the central limit theorem, we know a great deal about the sampling distribution of the sample mean. The nice thing about the central limit theorem is that it holds whether we know all about the population or not.

9 central limit theorem The basic idea of the central limit theorem is that if you consider samples from a population, the sampling distribution of sample means 1) has a normal distribution - the sampling distribution is normal, 2) has mean value equal to the mean of the population, and, 3) has standard deviation or, in this context, a standard error equal to the standard deviation of the population divided by the square root of the sample size. The standard error is just the standard deviation of the sampling distribution and, as such, is just given this special name.

10 central limit theorem So we see the variable in the population can have a normal distribution and the sample mean can have a normal distribution. Example: If in the population age ~N(30, 3), then samples of size, say 9, have x ~N(30, 1). How did I get this? Do you get it?

rule For a normal distribution it is know that 1) approximately 68% of the values are within 1 standard deviation of the mean, 2) approximately 95% of the values are within 2 standard deviations of the mean, and 3) approximately 99.7% of the values are within 3 standard deviations of the mean. So from our example of age before, in the population 68% of the people are between 27 and 33, but 68% of the sample means would fall between 29 and 31.

12 rule in a graph population age mean age

13 statistical inference Up to this point we have operated as if we knew the population mean. (What we have done will act as a model for what we are about to do.) But most of the time we don’t - that is why we have statistics. We will take a sample and try to infer what the population mean is from the sample we draw. The two methods of inference are 1) confidence intervals and 2) hypothesis tests. Let’s briefly look at these for the unknown population mean because the same basic idea applies to regression as well.

14 confidence interval When we take a sample and calculate the mean of the sample we could use this sample mean as our estimate of the population mean. But remember that the mean of the sample would vary depending on the sample. Instead of just a point estimate of the mean of the population we use an interval or range of values for our estimate of where the population mean might be. To account for sampling variability, we use an interval.

15 confidence interval sample means true mean we just don’t know it. The lines I put here tell us where 95% of the means should fall. The distance from the center is 1.96(s)/(square root of sample size) s below is the population standard deviation, which we will assume is known.

16 confidence interval sample means Now when we get the sample mean we use the same distance, 1.96 (s)/(square root of sample size), around the sample mean. We are then 95% confident that our interval will contain the true unknown mean. x

Where did I get the 1.96 on the previous page? Before we said approximately 95% of the sample means are within 2 standard deviations of the mean. To be more precise we say 95% of the sample means are within 1.96 standard deviations. If you look at the standard normal table in the book you see associated with a Z = 1.96 the value.475. So.025 is in the upper tail, and due to symmetry,.025 in the lower tail of a normal distribution. So to be precise we use 1.96 in the formulas when we refer to the middle 95%.

18 Analogy Say I have a stick and it has a certain length. Also say if I sit in the middle of the room I can whack 95% of you with the stick. This also means that if each of you are given the stick, 95% of you will be able to hit me when I am sitting in the middle. (Let’s play who can hit the lightest, you go first ) The length of the stick is 1.96 (s)/(square root of sample size), which is sample dependent. If we were at the true center we could use this stick and “hit 95%” of the values. So if we take a sample and get xbar, then 95% of the time we should be able to “hit” the true center.

19 hypothesis test In a hypothesis test we don’t know the unknown population mean, but we have a value in mind(the hypothesized value), say from other research or the like. What we then do is use the hypothesized value as if it were the true value and see how likely our sample mean value would be, coming from the population with the center at the hypothesized value. Low probabilities of occurrence(less than 5% or.05) would have us reject our hypothesized value as the true mean.

20 hypothesis test sample means With the hypothesized value as the center, we would look at the probability of getting the sample mean value or a more extreme value. If the shaded value is.05 or less(for a one tail test) we reject the hypothesized value as the true value. x p-value

21 hypothesis test sample means When this shaded area is.05 or less we are saying that, based on the hypothesized value as the center, the probability of getting a sample mean with the value we obtained is so small that we will reject our hypothesized value and conclude the center value must be something else. x