More than just maps. A Toolkit for Spatial Analysis GUI access to the most frequently used tools ArcToolbox – an expandable collection of ready-to-use.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Managerial Economics in a Global Economy
Chapter 12 Inference for Linear Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
11 Pre-conference Training MCH Epidemiology – CityMatCH Joint 2012 Annual Meeting Intermediate/Advanced Spatial Analysis Techniques for the Analysis of.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and regression
Spatial statistics Lecture 3.
Analysis of Economic Data
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
The Simple Regression Model
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Lecture 24: Thurs., April 8th
Multiple Regression and Correlation Analysis
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
IS415 Geospatial Analytics for Business Intelligence
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Esri International User Conference | San Diego, CA Technical Workshops | Spatial Statistics: Best Practices Lauren Rosenshein, MS Lauren M. Scott, PhD.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Food Store Location Analysis Albuquerque New Mexico, 2010 Prepared for: Geography 586L - Spring Semester, 2014 Larry Spear M.A., GISP Sr. Research Scientist.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Spatial Analysis Workshop In-Class Exercise Tuesday, December 11, 2012.
Managerial Economics Demand Estimation & Forecasting.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Taking ‘Geography’ Seriously: Disaggregating the Study of Civil Wars. John O’Loughlin and Frank Witmer Institute of Behavioral Science University of Colorado.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 13 Multiple Regression
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
Chapter 15 Multiple Regression Model Building
Multiple Regression Analysis and Model Building
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

more than just maps

A Toolkit for Spatial Analysis GUI access to the most frequently used tools ArcToolbox – an expandable collection of ready-to-use tools ModelBuilder – a visual programming environment Python – A FOSS scripting language integrated with ArcGIS

Most analyses involve repeated use of common tools like “Select by Attribute” (SQL)

We can also create a selection interactively The selected river is highlighted The associated rows in the attribute table are also highlighted

Creating a Buffer allows identification of all objects that fall within a specified distance of a selected feature

Buffers allow further selection, i. e., all patients within 1000 meters of the river

ModelBuilder is a visual programming environment where data and tools can be dragged unto a blank slate to create a working program. Models can be saved Models can be exported as graphic documentation Models can be exported as scripts for further development

Models are an excellent way to write bug-free code fragments, which can then be assembled into larger scripts Models and scripts can be called in Models and Scripts and can be saved as tools in ArcToolbox

PYTHON is a modern Language that is well supported and easy to learn.

Observed Values Predicted Values Ordinary Least Square (OLS) Geographically Weighted Regression (GWR) Regression analysis 10 Regression analysis allows you to:Regression analysis allows you to: –Model, examine, and explore spatial relationships –Better understand the factors behind observed spatial patterns –Predict outcomes based on that understanding

What’s the big deal? Pattern analysis (without regression): o Are there places where people persistently die young? o Where are test scores consistently high? o Where are 911 emergency call hot spots? 11

Why use regression? Understand key factors What are the most important habitat characteristics for an endangered bird? Predict unknown values How much rainfall will occur in a given location? Test hypotheses “Broken Window” Theory: Is there a positive relationship between vandalism and residential burglary? 12

Applications Education Why are literacy rates so low in particular regions? Natural resource management What are the key variables that explain high forest fire frequency? Ecology Which environments should be protected, to encourage reintroduction of an endangered species? Transportation What demographic characteristics contribute to high rates of public transportation usage? Many more… Business, crime prevention, epidemiology, finances, public safety, public health

Regression analysis terms and concepts Dependent variable (Y): What you are trying to model or predict (e.g., residential burglary). Explanatory variables (X): Variables you believe cause or explain the dependent variable (e.g., income, vandalism, number of households). Coefficients (β): Values, computed by the regression tool, reflecting the relationship between explanatory variables and the dependent variable. Residuals (ε): The portion of the dependent variable that isn’t explained by the model; the model under- and over-predictions. 14

Intercept INCOME VANDALISM HOUSEHOLDS LOWER CITY Regression model coefficients Coefficient sign (+/-) and magnitude reflect each explanatory variable’s relationship to the dependent variable The asterisk * indicates the explanatory variable is statistically significant

Building a global OLS regression model Choose your dependent variable (Y). Identify potential explanatory variables (X). Explore those explanatory variables. Run OLS regression with different combinations of explanatory variables, until you find a properly specified model. 16

Regression analysis output Adjusted R-Squared [2]: Akaike’s Information Criterion (AIC) [2]:

Why are people dying young in South Dakota? Do economic factors explain this spatial pattern? Poverty rates explain 66% of the variation in the average age of death dependent variable: Adjusted R-Squared [2]: However, significant spatial autocorrelation among model residuals indicates important explanatory variables are missing from the model. Use OLS to test hypotheses 18

19 Build a multivariate regression model Explore variable relationships using the scatterplot matrixExplore variable relationships using the scatterplot matrix Consult theory and field expertsConsult theory and field experts Look for spatial variablesLook for spatial variables Run OLS (this is an iterative, often tedious, trial and error, process)Run OLS (this is an iterative, often tedious, trial and error, process)

Check OLS results 1 Coefficients have the expected sign. 2 No redundancy among model explanatory variables. 3 Coefficients are statistically significant. 4 Residuals are normally distributed. 5 Strong Adjusted R-Square value. 6 Residuals are not spatially autocorrelated.

Online help is … helpful!

Coefficient significance Look for statistically significant explanatory variables. 22 * Statistically significant at the 0.05 level.

Multicollinearity Find a set of explanatory variables that have low VIF values. In a strong model, each explanatory variable gets at a different facet of the dependent variable. What did one regression coefficient say to the other regression coefficient? …I’m partial to you! 23 VIF [1] Large VIF (> 7.5, for example) indicates explanatory variable redundancy.

Model performance Compare models by looking for the lowest AIC value. As long as the dependent variable remains fixed, the AIC value for different OLS/GWR models are comparable Look for a model with a high Adjusted R-Squared value. Akaike’s Information Criterion (AIC) [2]: Adjusted R-Squared [2]: [2] Measure of model fit/performance.

Model bias When the Jarque-Bera test is statistically significant: The model is biased Results are not reliable Often indicates that a key variable is missing from the model 25 [6] Significant p-value indicates residuals deviate from a normal distribution. Jarque-Bera Statistic [6]: Prob(>chi-sq), (2) degrees of freedom:

Spatial Autocorrelation 26 Statistically significant clustering of under and over predictions. Random spatial pattern of under and over predictions.

For each explanatory variable, GWR creates a coefficient surface showing you where relationships are strongest. Global vs. local regression models OLS Global regression model One equation, calibrated using data from all features Relationships are fixed GWR Local regression model One equation for every feature, calibrated using data from nearby features Relationships are allowed to vary across the study area 27

Running GWR GWR is a local spatial regression model Modeled relationships are allowed to vary GWR variables are the same as OLS, except: Do not include spatial regime (dummy) variables Do not include variables with little value variation

Defining local GWR constructs an equation for each feature Coefficients are estimated using nearby feature values GWR requires a definition for nearby Kernel type Fixed: Nearby is determined by a fixed distance band Adaptive: Nearby is determined by a fixed number of neighbors Bandwidth method AIC or Cross Validation (CV): GWR will find the optimal distance or optimal number of neighbors Bandwidth parameter: User-provided distance or user-provided number of neighbors 29

Interpreting GWR results Compare GWR R2 and AIC values to OLS R2 and AIC values The better model has a lower AIC and a high R2. Residual maps show model under- and over-predictions. They shouldn’t be clustered. Coefficient maps show how modeled relationships vary across the study area. Model predictions, residuals, standard errors, coefficients, and condition numbers are written to the output feature class. Check condition numbers: > 30 indicates a less reliable result 30

GWR prediction Observed Modeled Predicted Calibrate the GWR model using known values for the dependent variable and all of the explanatory variables. Provide a feature class of prediction locations containing values for all of the explanatory variables. GWR will create an output feature class with the computed predictions.