Presentation on theme: "Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance."— Presentation transcript:
Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance -- worthless
A no-variance, reliable estimator? The 0 estimator
Eyeball vs. Least squares for assignment 1 http://hspm.sph.sc.edu/COURSES/J716/demo s/StudentLines/StudentLines.html http://hspm.sph.sc.edu/COURSES/J716/demo s/StudentLines/StudentLines.html
Hypothesis testing – parallels among the coin toss, card trick, and assignment 1A experiments A statistic calculated from our data A critical value for that statistic calculated theoretically based on a hypothesis about how the data were generated If our statistic were greater than the critical value, we would reject the hypothesis.
Hypothesis testing – all about calculating the probability of what you got and drawing an inference With the coin toss experiment – A statistic calculated from our data Counted how many tails came up – A critical value for that statistic calculated theoretically based on the hypothesis that the coin was fair 5 consecutive results that are all the same – When our statistic was greater than the critical value, we rejected the hypothesis
Hypothesis testing – all about calculating the probability of what you got and drawing an inference With the card experiment – A statistic calculated from our data Counted how many times I guessed the card – A critical value for that statistic calculated theoretically based on the hypothesis that the any of 52 cards could come up Even one right guess has a probability less than 0.05, so the critical value is 1. – When our statistic was as big as the critical value, we rejected the hypothesis
T statistic hypothesis tests calculate a probability and draw an inference With the assignment 1A spreadsheet – A statistic calculated from our data The estimated coefficient divided by its standard error – A critical value for that statistic calculated theoretically based on the hypothesis that the true line’s slope is 0. 2.571 – When our statistic is greater than the critical value, we reject the hypothesis
Not rejecting a false hypothesis Type II error in assignment 1A part 2
How the assumptions apply to the eyeball line and the least squares line
Assumption 1 is that there is a true line and that what you see differs from the true line because of random errors up or down for each point. Eyeball line: It's why you drew a line through the points, instead of using a curve or a wiggly line that goes from one point to the next. Least squares: It’s why you built a spreadsheet that calculates the slope and intercept of a line.
Assumption 2 is that the errors have an expected value of 0. Eyeball line: it's why you try to draw the line through the middle of the points, rather than off to one side or tilting differently. Least squares: The average of the residuals is 0. (The residuals are your estimates of the errors.)
Assumption 3 is that the errors all have the same variance. Eyeball line: It's why you don't favor one point over another in drawing the line. Least squares: The spreadsheet’s sum and average rows are simples sums and averages. No data row gets a different weight from another.
Assumption 4 is that the errors are independent, not correlated with each other. Eyeball line: It's why you predict for X=800 using a point on the line Least squares: Its why you predict for X=800 with 800*slope + intercept.
Confidence interval for a coefficient Coefficient ± its standard error × t from table 95% probability that the true coefficient is in the 95% confidence interval? If you do a lot of studies, you can expect that, for 95% of them, the true coefficient will be in the 95% confidence interval. If 0 is in the confidence interval, then the coefficient is not significant.
Assignment 2 All regression results are the same Graphs differ Need reason to use or doubt least squares prediction The reason is in the form of rejecting one or more of the assumptions
Durbin-Watson statistic Serial correlation – Finds significant pattern for clinic 2
Confidence interval for prediction The hyperbolic outline
Formal outlier test? Use confidence interval of prediction – With and without the suspect point? How do you predict when your data have an outlier? – Totally ignoring it seems wrong. – So does letting it sway your results too much. – Investigate and use judgment.
Multiple regression 3 or more dimensions 2 or more X variables Y = α + βX + γZ + error Y = α + β 1 X 1 + β 2 X 2 + … + β p X p + error
Fitting a plane in 3D space Linear assumption – Now a flat plane – The effect of a change in X 1 on Y is the same at all levels of X 1 and X 2 and any other X variables. Residuals are vertical distances from the plane to the data points floating in space.
Multiple regression Separating effects – Example from literature – Example from handout
β interpretation in Y = α + βX + γZ + error β is the effect on Y of changing X by 1, holding Z constant. When X is one unit bigger than you would predict it to be from what Z is, then we expect Y to be β more than what you would predict it would be from what Z is. – Those prediction are based on linear relationships.