Download presentation
Presentation is loading. Please wait.
Published byPhilip Townsend Modified over 6 years ago
1
i.e. How to get an A on the big project I’m about to assign you…
Regression Notes i.e. How to get an A on the big project I’m about to assign you…
2
Bivariate (2-variable) Statistics
PHASE I. Scatter-Plots and Bivariate (2-variable) Statistics
3
Two-Variable Descriptors: One Variable Descriptors:
Describing Variables Two-Variable Descriptors: LINEARITY? DIRECTION ? SCATTER? anything unusual? One Variable Descriptors: SHAPE CENTER SPREAD anything unusual?
4
What do you see in these scatter plots?
Mean January Air Temperatures for 30 U.S. Locations 20 19 18 Temperature (°C) 17 16 LINEAR TREND NEGATIVE ASSOCIATION 15 CONSTANT SCATTER 14 ANYTHING UNUSUAL? 35 40 45 Latitude (°N)
5
What do you see in these scatter plots?
10 20 30 40 GDP per capita (thousands of dollars) 50 60 70 80 Internet Users (%) % of population who are Internet Users vs GDP per capita for 202 Countries NON-LINEAR TREND POSITIVE ASSOCIATION NON-CONSTANT SCATTER and an OUTLIER!!!
6
What do you see in these scatter plots?
Year 1990 1980 1970 1960 1950 1940 1930 30 28 26 24 22 20 Age Average Age Americans are First Married 2 SEPARATE, NON-LINEAR TRENDS …gap in data in 1940s? NEGATIVE ASSOCIATION TIL ~1970, THEN POSITIVE NO SCATTER
7
What to look for in scatter plots
Trend Linear or non-linear?
8
What to look for in scatter plots
Trend Positive or negative association?
9
What to look for in scatter plots
2. Scatter Strong or weak relationship?
10
What to look for in scatter plots
2. Scatter Constant or non-constant scatter?
11
What to look for in scatter plots
3. Anything unusual Outlier
12
What to look for in scatter plots
3. Anything unusual Groupings
13
Rank relationships: weakest (1) to strongest (4)
4 2 1 3
14
Correlation Coefficient
r
15
Correlation Coefficient little r – what is it?
r measures the strength of a linear relationship Your calculator will find it for you. -1 ≤ r ≤ 1 r is a multiple of the slope
16
Only use r if the scatter plot is linear
r – when can it be used? Only use r if the scatter plot is linear x y * r = 0.99
17
Don’t use r if the scatter plot is non-linear!
r – when can it be used? Don’t use r if the scatter plot is non-linear! r = 0.00
18
r – when can it be used? Tick the plots where it’s OK to use a correlation coefficient to describe the strength of the relationship:
19
r – when can it be used? Tick the plots where it’s OK to use a correlation coefficient to describe the strength of the relationship:
20
How close the points in the scatter plot come to lying on the line
r – what does it tell you? How close the points in the scatter plot come to lying on the line r = 0.99 x y * r = 0.57 Difficult Ones
21
Playing with Outliers (1)…
Playing with Outliers (1)… What will happen to the correlation coefficient if we remove the tallest 12th grader? bigger or smaller Hint: …correlation measures how linear the data is and an OUTLIER!!! LINEAR TREND POSITIVE ASSOCIATION MOSTLY CONSTANT SCATTER See for yourself HERE
22
Playing with Outliers (2)…
Playing with Outliers (2)… What will happen to the correlation coefficient if we remove the elephant? bigger or smaller Hint: …make your brain zoom in on that main cluster of points and an OUTLIER!!! LINEAR TREND POSITIVE ASSOCIATION CONSTANT SCATTER See for yourself HERE
23
PHASE II. Lurking Variables
24
Guess which are correlated with Test Scores? 4 are… 4 aren’t…
Highly educated parents Mom’s age >30 at birth Mom stays home until Kindergarten Intact family (live with mom and dad) Attended Head Start program Parents have money Move to a better neighborhood Low birthweight (including premature)
25
Guess which are correlated with Test Scores? 3 are… 4 aren’t…
Parents speak English Family goes to museums, zoos, concerts… Parents involved in PTA Child spanked regularly Watches TV a lot Parents own a lot of books Child is read to every day
26
Life Expectancy Example
Life Expectancy and Availability of Doctors for a Sample of 40 Countries 80 Can you suggest how to increase life expectancy in a country? - Non-linear trend - Negative Association - Fairly Constant Scatter 70 Life Expectancy Get fewer people per doctor! Duh! 60 50 10000 20000 30000 40000 People per Doctor
27
Life Expectancy Example
Life Expectancy and Availability of Televisions for a Sample of 40 Countries 80 Can you suggest how to increase life expectancy in a country? 70 Get fewer people per TV?!? Life Expectancy BEWARE LURKING VARIABLES!!! 60 50 100 200 300 400 500 600 People per Television
28
Kinds of Lurking Variables (1)
CAUSATION “People who take showers have better organizational skills.” perceived correlation Organized y Shower x Maybe changes in x CAUSE changes in y
29
Kinds of Lurking Variables (2)
CAUSATION again... but in reverse “People who take showers have better organizational skills.” perceived correlation Organized y Shower X Maybe changes in y CAUSE changes in x
30
Kinds of Lurking Variables (3)
COMMON RESPONSE “People who take showers have better organizational skills.” perceived correlation Organized y Shower x Good Habits in General z Maybe something else z is causing changes in both X and Y at the same time!
31
Kinds of Lurking Variables (4)
CONFOUNDING …we don’t know which variable (x or z) is causing the changes in y. They’re hopelessly mixed up with each other. “People who take showers have better organizational skills.” perceived correlation Organized y Shower x Good Habits in General z
32
LURKING VARIABLES Heh, Heh, Heh…
33
(Clearly children can hold their drink better than adults)
How Regression gets you in Trouble… Famous examples of strong correlations: Instances of drunkenness in those below 18 years of age are significantly lower than for those above. (Clearly children can hold their drink better than adults)
34
(eating ice cream makes you tastier?)
How Regression gets you in Trouble… Famous examples of strong correlations: Whenever ice cream sales rise, so do the number of shark attacks. (eating ice cream makes you tastier?)
35
(learning words make you hungry?)
How Regression gets you in Trouble… Famous examples of strong correlations: As vocabulary in infancy rises, so does appetite. (learning words make you hungry?)
36
(firetrucks cause damage?)
How Regression gets you in Trouble… Famous examples of strong correlations: The more fire trucks you send to a fire, the worse the damage is. (firetrucks cause damage?)
37
The more you pay teachers in a town, the more expensive alcohol is.
How Regression gets you in Trouble… Famous examples of strong correlations: The more you pay teachers in a town, the more expensive alcohol is.
38
How Regression gets you in Trouble…
How Regression gets you in Trouble… Famous examples of strong correlations: In Scandinavia, storks appear more often on the rooftops of families with more babies.
39
Left-handed peole die earlier than right-handed people.
How Regression gets you in Trouble… Famous examples of strong correlations: Left-handed peole die earlier than right-handed people. (no. Older people grew up in an era where being left-handed was discouraged. Rightys are more common in older people; leftys are more common in the young. When you look at deaths, leftys die younger because leftys in general are younger!)
40
Deer and cattle, orient themselves along a north/south axis when grazing.
41
Correlation is not Causation
How Regression gets you in Trouble… Famous examples of strong correlations: Correlation is not Causation
42
The story: The smoking ban in Wales "caused" a 13% fall in heart attacks from October to December 2007, compared with the same period in 2006. The flaw: The ban began in April. In April, we also observed a 13% fall in heart attacks. Presumably the ban "caused" me to spill my coffee, for that happened during April too.
43
!! TRADE UNIONS SECURE BETTER PAY !!
See? Union membership can get you as much as 30% more pay!! 43
44
!! TRADE UNIONS SECURE BETTER PAY !!
perceived correlation Union Membership x Better Pay! y Education level of employee z Experience Level of employee z Age of employee z 44
45
Gapminder 45
46
Least Square Regression Lines (LSRL)
PHASE III. Residuals and Least Square Regression Lines (LSRL)
47
Residuals = Actual – Predicted Residuals = Actual – Predicted 4 =
prediction line y = 5 + 2x The actual point is (8, 25) (8, 25) 25 The predicted point is (8, 21) 21 (8, 21) 17 4 6 8 10 12 Residuals = Actual – Predicted 4 = 25 –
48
“Actual” – “Predicted”
7 -3 17 1 -10 -4
49
Σ (Resids)2 = 439.2988 We’ll try to get the Least Squares
Least Squares Regression: We’ll try to get the Least Squares Σ (Resids)2 =
50
Least Squares Regression Line Facts:
There is one and only one LSRL for every set of bivariate data. Σ Residuals = (just like with st.dev) Your calculator will give you the one equation with the “least” amount of squares… “Least Squares Regression Line” (LSRL) The LSRL must go through the point You’ll only have to calculate the LSRL by hand once (…heh, heh)
51
PHASE III. Your Project
52
Example of analysis: “Going Crackers”
An example of the type of work you’ll be doing for your REGRESSION ASSIGNMENT You start with some raw data… 1. Predict the energy content of a cracker with 25% fat content. 2. If you reduced the salt content by 100g, how would the fat change?
53
Example of analysis: “Going Crackers”
1. Predict the energy content of a cracker with 25% fat content. ENERGY FAT (ENERGY) = (FAT) = (25) = 504.5
54
Example of analysis: “Going Crackers”
2. If you reduced the salt content by 100g, how would the fat change? FAT SALT (FAT) = (SALT) The fat content would drop by mg.
55
Problem 2 Analysis The data suggest a linear trend. The association is positive with constant scatter about the trend line. It is reasonable to do a linear regression.
56
Problem 2 Analysis The LSRL is y = x. The slope of the fitted line is which tells us, on average, each 100mg decrease in salt content is associated with a decrease in total fat content by 2.4% The moderate relationship (r = 0.69) means that predicting such a drop will not necessarily be highly accurate.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.