Does X significantly affect Y? Does the inclusion of X in a model increase our ability to predict Y?
High-Level Statistical Overview Wraps around any predictive algorithm Linear Regression, Logistic Regression, Random Forests, … Cross-validation is used to obtain accurate measure of error Exact test is used to obtain accurate p-values No parametric assumptions (other than independence between observations) (Even independence may be violated if compensated for)
Housing Market Predicting housing price based on house and market attributes Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management 5: 81–102.
CrVa R 2 p-Value -Full Model-74.3 %< 0.01 AGE- 1.5 %0.01 ROOMS+ 20.4 %< 0.01 NOX+ 6.3 %< 0.01 PUPIL/ TEACHER - 1.4 %< 0.01 HIGHWAY- 2.6 %0.03 A 3 : Random Forest Model
Linear Regression Random Forest Support Vector Machines CrVa R 2 0.5930.7430.711 Significant at p = 0.05 ROOMS NOX PUPIL/TE ACHER AGE ROOMS NOX PUPIL/TE ACHER HIGHWAY AGE ROOMS NOX PUPIL/TE ACHER Not Significant at p = 0.05 AGE HIGHWAY
Environmental Productivity Measure utility of an ecosystem based on different physical attributes Maestre FT, Quero JL, Gotelli NJ, Escudero A, Ochoa V, et al. (2012) Plant Species Richness and Ecosystem Multifunctionality in Global Drylands. Science 335: 214–218.