Presentation on theme: "Class 4: Tues., Sept. 21 External/Internal Reliability Clarification Regression Analysis Examples: –Appropriate Dating Ages –Father’s and son’s heights."— Presentation transcript:
Class 4: Tues., Sept. 21 External/Internal Reliability Clarification Regression Analysis Examples: –Appropriate Dating Ages –Father’s and son’s heights Variability of Y given X in the Simple Linear Regression Model
Reliability In general, a measurement is reliable if it gives consistent results. My distinction between internal/external reliability of a measurement (e.g., a test) was not very precise. Here’s a better categorization. Four types of reliability for a measurement (degree of reliability can be measured by correlation): 1.Inter-observer: Different measurements of the same object/information give consistent results (e.g., two psychiatrists rate the behavior of a patient similarly; two Olympic judges score a gymnastics contestant similarly).
Types of Reliability Continued 2. Test-retest: Measurements taken at two different times are similar (e.g., a person’s pulse is similar for two different readings) 3.Parallel form: Two tests of different forms that supposedly test the same material give similar results (e.g., a person’s SAT scores are similar for two forms of the test). 4.Split-half: If the items on a test are divided in half (e.g., odd vs. even), the scores on the two halves are similar.
Examples of Reliability ExampleTypeCorrelation PulseTest-Retest0.90 Bedtime on a Wed. Test-Retest0.52 SAT scoresParallel Form or Split Half (not clear) 0.91
Regression Analysis Provides a model for the mean of Y given X=X 0, E(Y|X=X 0 ) and the variability of Y given X=X 0. Useful for understanding the association between Y and X and for predicting Y based on X. Simple linear regression model: – – has a normal distribution with mean 0 and standard deviation
Example: What age is too young? In U.S. culture, an older man dating a younger woman is not uncommon but when the age difference becomes too large, it may seem to some be unacceptable. A survey was taken of ten people whom were each asked the minimum acceptable age for a woman to be dating a man of a certain age for a range of ages. Y=minimum acceptable age of woman dating man of X years of age. X=age of man What is the mean of people’s minimum acceptable for a woman to be dating a man of X years of age, i.e., what is E(Y|X=X 0 )?
Estimated Mean (among survey population) Minimum Acceptable Age for a Woman dating a man who is –20 years old: 5.47+0.58*20 = 17.07 –30 years old: 5.47+0.58*30 = 22.87 –40 years old: 5.47+0.58*40 = 28.67 –50 years old: 5.47+0.58*50 =34.47 –60 years olds: 5.47+0.58*60=40.27 –70 years old: 5.47+0.58*70 = 46.07 Linear Fit Minimum Woman's Age = 5.472037 + 0.5753518 Man's Age
Father and Son’s Height Y=Son’s Height, X=Father’s Height (Galton’s Data from 19 th century England)
Variability of Y given X The simple linear regression model tells us more than the mean of Y given X=X 0, it tells us about the variability and distribution of Y given X=X 0. Simple linear regression model: – – has a normal distribution with mean 0 and standard deviation (SD) –The subpopulation of Y with corresponding X=X 0 has a normal distribution with mean and SD
Residuals and Estimating Estimating –Use least squares to estimate the slope and intercept of the simple linear regression model. Denote the slope estimates by and the intercept estimate by –Predicted value of Y i for observation i based on X i and regression model estimate: –Residual for observation i: Prediction error of using least squares line to predict Y i for observation i –Root mean square error = (approximately) standard deviation of residuals. Root mean square error is an estimate of For father-son height data, root mean square error = 2.4. This means that, according to the simple linear regression model, a son whose father is 72 inches has a mean height of 33.89 +.51*72 = 70.6 inches with a standard deviation of 2.4 inches.
Normal Distribution About 68% of the observations from a normal distribution will fall within one standard deviation ( ) of the mean ( ) About 95% of the observations from a normal distribution will fall within two standard deviations of the mean. About 99% of the observations will fall within three standard deviations of the mean.
Variability of Y given X According to the estimated regression model, the distribution of heights for sons whose father are 72 inches is a normal distribution with a mean of 70.6 inches and a standard deviation of 2.4 inches. If a son’s father’s height is 72 inches, –68% of the time the son’s height will be between inches –95% of the time, the son’s height will be between inches 99% of the time, the son’s height will be between inches.
Summary Regression model provides information about both the mean of Y given X and the variability of Y given X. For the simple linear regression model, the standard deviation of Y given X is estimated by the root mean square error. For the simple linear regression model, approximately 68% of the time, Y given X will be within one root mean square error of the estimated mean of Y given X ( ), approximately 95% of the time, Y given X will be within two root mean square errors of the mean of Y given X.