Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Baseball Pay and Performance By: Mikhail Averbukh Scott Brown Brian Chase.
Overview Motivation Data and Sources Methods Results Summary.
ADULT MALE % BODY FAT. Background  This data was taken to see if there are any variables that impact the % Body Fat in males  Height (inches)  Waist.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
The Simple Regression Model
REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –
1 4. Multiple Regression I ECON 251 Research Methods.
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Multiple Regression Analysis
STAT E100 Section Week 3 - Regression. Review  Descriptive Statistics versus Hypothesis Testing  Outliers  Sample vs. Population  Residual Plots.
1919 New York Yankees Official Logo William Brennan Sports Finance February 6, 2014.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
Review Multiple Choice Chapters 1-10
Correlation & Regression
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Spectators of Finnish baseball: comparing women’s and men’s games Seppo Suominen.
11/11/20151 The Demand for Baseball Tickets 2005 Frank Francis Brendan Kach Joseph Winthrop.
© Buddy Freeman, Independence of error assumption. In many business applications using regression, the independent variable is TIME. When the data.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
3.2 - Least- Squares Regression. Where else have we seen “residuals?” Sx = data point - mean (observed - predicted) z-scores = observed - expected * note.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Warm Up Feel free to share data points for your activity. Determine if the direction and strength of the correlation is as agreed for this class, for the.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Baseball: The Game of Statistics
The statistics behind the game
Chapter 8 Linear Regression.
The Baltimore Orioles, Relationship of Wins and Loses, Batting Average, Earned Run Average, and Errors Stalanic Anu, Matthew Beeman, Jonathon Chudoba,
Correlation and Simple Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
NCAA Basketball Tournament: Predicting Performance
Chapter 11: Simple Linear Regression
Regression Analysis Simple Linear Regression
Cautions about Correlation and Regression
Chapter 12: Regression Diagnostics
BPK 304W Regression Linear Regression Least Sum of Squares
BPK 304W Correlation.
Correlation and Simple Linear Regression
The statistics behind the game
The Effects of Atmospheric Pressure on a Baseball
Section 3.3 Linear Regression
AP Statistics, Section 3.3, Part 1
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Correlation and Simple Linear Regression
The Math of Baseball Will Cranford 11/1/2018.
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
CHAPTER 3 Describing Relationships
Simple Linear Regression and Correlation
Chapter 3: Describing Relationships
Science Fair – Baseball
Sample Presentation – Mr. Linden
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Objective Our goal is to find out if, over the past 10 years, there is a consistent factor that affects the winning percentage of the 30 teams in the Major League Baseball.

Explanatory Variables Team Batting Average Baseball Stadium Dimensions Team Payroll Average Game Attendance ERA (earned run average)

Explanation of Variables Team Batting Average This is the statistic used to evaluate the batter’s performance. Hits/Official At Bats Stadium Dimensions Each Stadium has a different field size, so we will be testing the distance from home plate to the left, center, and right wall to see if it has an impact on a team’s performance.

Explanation of Variables Team Payroll Each teams’ payroll in the MLB is different. In 2004, the highest paying team, Yankees, had a payroll of $184 million more then 6 times as much as the lowest paying team, the Devil Rays, at $27.5 million. We propose that higher paying teams perform better. The average payroll for the 2004 season was approximately $69 million, with a standard deviation of $33 million. Average Game Attendance The average game attendance for 2004 was approximately 30.3k people with a standard deviation of 8.9k.

Explanation of Variables Earned Run Average This is the statistic used to evaluate a pitcher’s performance. This is calculated using the following formula Number of runs allowed*9 Innings Pitched

Response Variable Winning Percentage for Each Team – Games won / Games played

2004 Data In this presentation we will take one year and will show you how we intend to analyze all of the data over the past 10 years, year by year.

Hypotheses H 0 : None of these variables have an affect on winning percentage H a : At least one of the variables have an affect on winning percentage

Initial Summary This initial summary shows that the p value is very small therefore we must conclude that at least one of the variables is significant. This is the summary of the most general linear model with all five explanatory variables present.

ANOVA Table This ANOVA table shows that at least three variables are significant because their p value is less then 0.05

Variance Inflation Factor The VIF for all five of the explanatory variables is less than 10 therefore we will not exclude any of them from the regression

Correlation Matrix The correlation matrix is showing a somewhat high correlation between attendance and payroll, however this is to be expected since teams with higher attendance would generate more revenue, and therefore have higher payrolls.

All Possible Regressions According to all the goodness criteria, the best model seems to be the one with ERA, Payroll, and Batting Average.

Summary of Stepwise Regression

The residuals seem to be distributed evenly above and below the 0 line. However the residuals seem to be more negative as the predicted winning percentage goes below.45. The Q-Q Plot indicates that the model is a not nearly a perfect fit, but is still close to a straight line.

Variance Test The variance test shows that most of the variances are very close to each other. This validates the assumption that the variances are approximately equal.

The only influential outlier, 19, is the New York Yankees. This is understandable given their astronomical payroll.

The Box-Cox Plot is indicating that a Box-Cox Transformation can be used with p = 2 to improve the model.

The Box-Cox Transformation has improved the model, and it can be seen in these graphs. The residuals appear to be much more normally distributed, and the line is much closer to 0 when the outlier is removed. The Q-Q plot is also closer to a straight line, indicating an improved model.

Summary of Final Model with BoxCox Plot Transformation

Conclusion The Box-Cox Transformation improved the model Unexpectedly, payroll was determined to play a comparatively minor role in the 2004 season. It also does not appear in the stepwise regression models for 5 of the past 10 years. The two explanatory variables that were consistent factors over the past 10 years were ERA and Batting Average