Presentation is loading. Please wait.

Presentation is loading. Please wait.

# First we will start with fake data x = runif(150, 0, 50) # next we create fake linear data y = 3.142 + x + rnorm(150)*2.5 # Let’s see what it looks.

Similar presentations


Presentation on theme: "# First we will start with fake data x = runif(150, 0, 50) # next we create fake linear data y = 3.142 + x + rnorm(150)*2.5 # Let’s see what it looks."— Presentation transcript:

1

2 # First we will start with fake data x = runif(150, 0, 50) # next we create fake linear data y = 3.142 + x + rnorm(150)*2.5 # Let’s see what it looks like plot(y ~ x)

3 # Use the ordinary leas squares model in R m = lm(y ~ x) # R provides a command for looking at the details of the fitted model str(m)

4 # Huh? This is a lot details! Let's try something more concise. print(m) # Let's visualize the model par(mfrow=c(2,2)) plot(m)

5 What are the 4 plots? 1.Residuals vs Fitted - residuals should be evenly distributed. Why? 2.Q-Q plot - test whether the residuals are distributed approx normal 3.Scale-Location - is variance approximately constant? 4.Residuals vs. Leverage - look for outliers. points far from the centroid in the y-axis are potential outliers.

6 # Lets visualize the difference between the observed and predicted values of y ypred = predict(m) par(mfrow=c(1,1)) plot(y,y,type='l', xlab="observed y", ylab="predicted y") points(y, ypred)

7 # We can get a more detailed view from the summary function summary(m)

8 Explanation of summary statistics: 1.Coefficients - Intercept and X values as well as std. error 2.t-value - coefficient/std.error ==> small value indicates variable has little/no effect on outcome 3.Pr(>|t) prob of observing this t-value if coefficient is zero 4.R-squared - goodness of fit 0 to 1, 1 is a good fit 5.F-statistic - is the model better than "guessing" close to 1 would as good as the null model 6.p-value - probability of seeing an F-value this large by chance

9 # Okay, lets look at actual data import RealEstate.csv # Visualize data plot(RealEstate) # Get summary statistics summary(RealEstate)

10 # Create the linear model r = lm(RealEstate$Price ~ RealEstate$Size) # Try the structure command again str(r) # Still not so helpful for us neophytes. # Why do you keep doing what I tell you to do? ;-)

11 # Ok, take the simple approach print(r) # Lets visualize the model as before par(mfrow=c(2,2)) plot(r)

12 # Let's look at these 4 plots now that we have real data. 1.Residuals vs Fitted - Are the residuals evenly distributed? 2.Q-Q plot - Are the residuals distributed approx normally? 3. Scale-Location - is variance approximately constant? 4. Residuals vs. Leverage - points far from the centroid in the y-axis are potential outliers. Any outliers? 5.Do we think that a linear model is good for this data?

13 #As before, lets visualize the difference between the observed and predicted values of y ypred = predict(r) par(mfrow=c(1,1)) plot(RealEstate$Price, RealEstate$Price, type="l", xlab="Observed y", ylab="Predicted y") points(RealEstate$Price, ypred)

14 # Print summary statistics of model summary(r)

15 Recall: Explanation of summary statistics: 1.Coefficients - Intercept and X values as well as std. error 2.t-value - coefficient/std.error ==> small value indicates variable has little/no effect on outcome 3.Pr(>|t) prob of observing this t-value if coefficient is zero 4.R-squared - goodness of fit 0 to 1, 1 is a good fit 5.F-statistic - is the model better than "guessing" close to 1 would as good as the null model 6.p-value - probability of seeing an F-value this large by chance


Download ppt "# First we will start with fake data x = runif(150, 0, 50) # next we create fake linear data y = 3.142 + x + rnorm(150)*2.5 # Let’s see what it looks."

Similar presentations


Ads by Google