Overview Motivation Data and Sources Methods Results Summary.

Slides:



Advertisements
Similar presentations
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.
Advertisements

Inference for Regression
Objectives (BPS chapter 24)
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 24: Thurs., April 8th
Simple Linear Regression Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Least Squares Regression
Simple Linear Regression Analysis
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Is Integrated Kinetic Energy a Comprehensive Index to Describe Tropical Cyclone Destructiveness? Emily Madison.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Regression Analysis (2)
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
A Comparison Between the Mets and the Yankees Many baseball fans criticize the New York Yankees for “buying” the best players in Major League Baseball.
STAT E100 Section Week 3 - Regression. Review  Descriptive Statistics versus Hypothesis Testing  Outliers  Sample vs. Population  Residual Plots.
1.5 Cont. Warm-up (IN) Learning Objective: to create a scatter plot and use the calculator to find the line of best fit and make predictions. (same as.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
Example x y We wish to check for a non zero correlation.
The Relationship of Weather and Movie Ticket Sales An analysis to show if temperature and precipitation influence ticket sales of Atlantic Station Regal.
AP Statistics Final Project Philadelphia Phillies Attendance Kevin Carter, Devon Dundore, Ryan Smith.
Bailey Wright.  Tornadoes are formed when the vertical wind shear, vertical vorticity, and stream line vorticity conditions are favorable. ◦ Storms and.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Module 25: Confidence Intervals and Hypothesis Tests for Variances for One Sample This module discusses confidence intervals and hypothesis tests.
Chapter 13 Simple Linear Regression
Introduction to Inference
Regression and Correlation
Inference for Least Squares Lines
Basic Estimation Techniques
ENM 310 Design of Experiments and Regression Analysis
Least Square Regression
Confidence Intervals and Hypothesis Tests for Variances for One Sample
Inference for Regression
Chapter 11: Simple Linear Regression
Least Square Regression
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
The Effects of Atmospheric Pressure on a Baseball
The greatest blessing in life is
Inferences Between Two Variables
Part I Review Highlights, Chap 1, 2
Inference for Regression
Sample Presentation – Mr. Linden
Introductory Statistics
Presentation transcript:

Overview Motivation Data and Sources Methods Results Summary

Motivation Figure 1. From Jason Samenow, Capital Weather Gang, Washington Post. Temperatures compared to a baseline since 1880 and the average home runs per team per game since 1880.

Motivation Figure 2. From Alan M. Nathan, Baseball Prospectus. For a given home run hit at temp T, R-R1 is the extra distance the ball travels at that temperature relative to how far it would have traveled had the temperature been 72.7°F. Plot shows the average value of R-R1 for each temperature bucket, slope of line is 0.25 ft/°F.

Data and sources Focus on Turner Field regular season games from 1997 – 2013 (≈ 1300 games) Baseball data from retrosheet.org Offensive stats: total home runs and runs per game Pitching stats: total strikeouts and walks per game Weather data from Iowa State University’s archive of Automated Surface Observing Network (ASOS) Weather data: Temperature, dew point, relative humidity, MSLP, cloud cover Figure 3. Locations of Turner Field and ASOS station at Hartsfield-Jackson International Airport. Distance between them approximately 7 miles.

Methods Correlation coefficients calculated to determine a statistically significant relationship Regression analysis used to identify whether least squares, RMA, or PC best for data. Bootstrap done for LSR slope and correlation coefficient.

Results: Sky condition Impact Sky condition only impacts day games CLR vs. OVC only Day games significantly smaller sample (387 games, 29%) Unexpected results for strikeouts, compared to Kent and Sheridan, 2011 Figure 4. Sky conditional impact for day games at Turner Field,

Results: Correlations TempDew Point RHMSLP Home runs Runs Strikeouts Walks Overall low correlations somewhat surprising Only two statistically significant (p < 0.05) correlations Two also within the CC 95% confidence interval HR and temperature p = (CI: to ) Walks and temperature p = (CI: to ) Table 1. Correlation Coefficients. Red indicates statistically significant.

Results: LSR regressions Figure 5. LSR home run vs. temperature Figure 6. LSR run vs. temperature Figure 7. LSR walks vs. temperature

Results: Home run regressions Figure 8. LSR, RMA, and PC regression fit comparison for home run vs. temperature. Slopes LSRPC LSR slope 95% CI LowerUpper R 2 (variance) LSRPC

Results: Home run residuals Residuals taken and tested. Chi-squared test for LSR and PC residuals indicates that neither are normally distributed. Residual Chi-squared Chi critical value LSR chi- squared value PC chi- squared value e3 Figure 9. Stem plot of LSR and PC residuals.

Results: Bootstrap LSR slope, correlation coefficient Bootstrap mean slope and mean CC similar to original. Bootstrap chi-squared test confirms what we already knew: data not normally distributed. Bootstrap Mean LSR slope (original) Mean correlation coefficient (original) CI: to.0090 ( CI: to 0.297) CI:0.059 to (0.060 CI: to ) Figure 10. Histogram of bootstrapped slope and correlation coefficient.

Summary Physics of baseball tell us that temperature, dew point, and other variables should have impact on ball flight (batters) and ball movement (pitchers). Correlation of temperature and home runs and walks at Turner Field not high but it is statistically significant. Too many other non atmospheric factors: baseball is a “game of inches”. Non-normality of data does call into question confidence intervals for regression analysis.