COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Objectives 10.1 Simple linear regression
Geographically weighted regression
Forecasting Using the Simple Linear Regression Model and Correlation
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
8 - 1 Multivariate Linear Regression Chapter Multivariate Analysis Every program has three major elements that might affect cost: – Size » Weight,
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
More than just maps. A Toolkit for Spatial Analysis GUI access to the most frequently used tools ArcToolbox – an expandable collection of ready-to-use.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Applied Geostatistics
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Deterministic Solutions Geostatistical Solutions
Inferences About Process Quality
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Briggs Henan University 2010
IS415 Geospatial Analytics for Business Intelligence
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Correlation and Regression
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Testing Hypotheses about Differences among Several Means.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Spatially Assessing Model Error Using Geographically Weighted Regression Shawn Laffan Geography Dept ANU.
Taking ‘Geography’ Seriously: Disaggregating the Study of Civil Wars. John O’Loughlin and Frank Witmer Institute of Behavioral Science University of Colorado.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Correlation & Regression Analysis
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Machine Learning 5. Parametric Methods.
Tutorial I: Missing Value Analysis
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Chapter 7. Classification and Prediction
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 11 Simple Regression
Further Inference in the Multiple Regression Model
I271B Quantitative Methods
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 4
Simple Linear Regression
Product moment correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Interval Estimation of mean response
Presentation transcript:

COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial Analysis Research Group Department of Geography, University of Newcastle upon Tyne, Newcastle upon Tyne, ENGLAND NE1 7RU

GEOGRAPHICALLY WEIGHTED REGRESSION The mechanics of GWR New software for GWR: GWR 2.0 GWR in practice: an example of the determinants of London house prices Won’t discuss the math of GWR in much detail

Some Definitions Spatial nonstationaritySpatial nonstationarity exists when the same stimulus provokes a different response in different parts of the study region Global modelsGlobal models are statements about processes which are assumed to be stationary and as such are location independent Local modelsLocal models are spatial disaggregations of global models, the results of which are location-specific

Local versus Global LocalglobaldataLocal versus global data: the example of US climate data LocalglobalrelationshipsLocal versus global relationships: the example of house price determinants LocalglobalmodelsLocal versus global models: the example of regression

Why might relationships vary spatially? Sampling variation Relationships intrinsically different across space e.g. differences in attitudes, preferences or different administrative, political or other contextual effects produce different responses to the same stimuli Model misspecification - suppose a global statement can ultimately be made but models not properly specified to allow us to make it. Local models good indicator of how model is misspecified. Can all contextual effects ever be removed? Can all significant variations in local relationships be removed?

Regression In a typical linear regression model applied to spatial data we assume a stationary process: y i =  0 +  1 x 1i +  2 x 2i +…  n x ni +  i

so that... The parameter estimates obtained in the calibration of such a model are constant over space:  ’ = (X T X) -1 X T Y which means that any spatial variations in the processes being examined can only be measured by the error term

Consequently... We might map the residuals from the regression to determine whether there are any spatial patterns. Or compute an autocorrelation statistic We might even try to ‘model’ the error dependency with various types of spatial regression models.

However... Why not address the issue of spatial nonstationarity directly and allow the relationships we are measuring to vary over space? This is the essence of GWR y( g ) =  0 ( g ) +  1 ( g ) x 1 +  2 (g) x 2 +…  n (g) x n +  ( g ) where (g) refers to a location at which estimates of the parameters are obtained

… with the estimator  ’ = (X T W (g) X) -1 X T W (g) Y where W (g) is a matrix of weights specific to location g such that observations nearer to g are given greater weight than observations further away.

w g1 0.……..…..0 0 w g2 …..……..0 W(g)=0 0 w g3 …… ………w gn where w gn is the weight given to data point n for the estimate of the local parameters at location g

Weighting schemes fixedadaptive Numerous weighting schemes can be used. They can be either fixed or adaptive. Two examples of a fixed weighting scheme are the Gaussian function: w ij = exp[-(d ij 2 / h 2 )/2] where h is known as the bandwidth and controls the degree of distance-decay and the bisquare function: w ij = [1-(d ij 2 / h 2 )] 2 if d ij < h = 0 otherwise

Perhaps better... Is to use a spatially adaptive weighting function such as: w ij = exp(-R ij / h) where R is the ranked distance or w ij = [1-(d ij 2 / h 2 )] 2 if j is one of the Nth nearest neighbours of i = 0 otherwise In the latter, we estimate an optimal value of N in the GWR routine

Calibration The results of GWR appear to be relatively insensitive to the choice of weighting function as long as it is a continuous distance-based function Whichever weighting function is used, the results will, however, be sensitive to the degree of distance-decay. Therefore an optimal value of either h or N has to be obtained. This can be found by minimising a crossvalidation score or the Akaike Information Criterion

where... CV =  i [y i - y  i ’ (h) ] 2 where y  i ’ (h) is the fitted value of y i with data from point i omitted from the calibration AIC = 2n ln(  ’) + n ln(2  ) SS + n[n+ Tr(S)] / [n Tr(S)] S where n is the number of data points,  ’ is the estimated standard deviation of the error term, and Tr(S) is the trace of the hat matrix. y’ = Sy

GWR Jargon Sample pointsSample points –locations at which your data are measured Regression pointsRegression points –locations at which you require parameter estimates These need not be the same locations This can be handy if you want to map the results from very large data sets

In GWR, we can also... estimate local standard errors derive local t statistics calculate local goodness-of-fit measures calculate local leverage measures perform tests to assess the significance of the spatial variation in the local parameter estimates perform tests to determine if the local model performs better than the global one, accounting for differences in degrees of freedom

Software for GWR (GWR 2.0) User-friendly and menu-driven Provide a range of options Provide some helpful diagnostics Provide results for other software

The architecture of GWR 2.0 Currently about 3000 lines of FORTRAN VB front-end to create a control file and run the program Can run FORTRAN code under Unix with a control file or, more easily, by using the VB model editor in Windows (95; 98; ME)

Running the program There are several routes through the program depending on what you want to do

Creating a new model Data Dependent variable Independent variables Sample point locational variables Regression point locational variables Weighting scheme/fitting Output/listing

Data first... Usual Windows form Assumes filetype of.dat

Data structure First line is a comma separated list of variable names (<= 8 characters) Data lines have numeric items only terminated by a carriage return One line of data per location Space or comma delimited (easily imported)

User’s decision Are the sample points and regression points identical?

Output File This will appear in the results folder What type do you want?

Types of output Currently the program will output parameter estimates and other information in these formats: –ArcINFO ungenerated export.e00 –MapInfo Interface File.mif –Comma Separated Variables.csv

The Model Editor The variable names have already been extracted from the file

The Model Editor To specify a dependent variable, highlight it in the list on the left and click on the [->] symbol

The Model Editor To specify independent variables, highlight them in the list on the left and click on the [- >] symbol

The Model Editor To specify location variables, highlight them in the list on the left and click on the relevant [->] symbols

The Model Editor To specify a weight variable, highlight it in the list on the left and click on the [->] symbol

The Model Editor Next you specify the type of kernel: this can be fixed (Gaussian) or variable (bisquare)

The Model Editor You can either preset the bandwidth in the units that the location variables are measured in (for example, metres)

Or if you want the program to determine the optimal bandwidth, specify crossvalidation. For large files, there is a sampling option to speed the process.

The Model Editor The parameter estimates can be output in different formats..csv is text format. ArcInfo and MapINFO will load directly.

The Model Editor The type of output in the printed listing can also be controlled.

The Model Editor Once the model specification is completed, save it before you run it. The file appears in the model folder.

Running the Model The saved model is identified. You must also specify a listing file for the printed output Then hit Run

Dependent (y) variable PctBatch Easting (x-coord) variable.....Longitud Northing (y-coord) variable.....Latitude No weight variable specified Dependent variables in this model... PctPov Kernel type: Variable width Determine kernel size using crossvalidation... ** Crossvalidation report requested ** Prediction report requested ** Parameter estimates to be written to.e00 file ** Monte Carlo significance tests for spatial variation ** Casewise diagnostics to be printed ** Regression models will include intercept Number of data cases read: 159 Observation points read... Dependent mean= Number of observations, nobs= 159 Number of predictors, nvar= 1 Obseration Easting extent: Observation Northing extent: The output listing Some useful statistics

CrossValidation begins... CV will be based on 159 cases Initial estimates for bandwidth Bandwidth AIC … … … … … … … … … … ** Convergence: NNN= 23 Calibration report In this case we are using an adaptive kernel. Each kernel will contain the locations of the 23 nearest sample points to each regression point.

********************************************************** * GLOBAL REGRESSION PARAMETERS * ********************************************************** Diagnostic information... Residual sum of squares Effective degrees of freedom Sigma Akaike Information Criterion Parameter Estimate Std Err T Intercept PctPov The parameter estimates for the “global” model are reported first. Both parameters are significant. Global Results

********************************************************** * GWR ESTIMATION * ********************************************************** Fitting Geographically Weighted Regression Model... Number of observations Number of independent variables... 2 (Intercept is variable 1) Number of nearest neighbours Number of locations to fit model Diagnostic information... Residual sum of squares Effective degrees of freedom Sigma Akaike Information Criterion The equivalent diagnostic information for the GWR model is printed. The AIC value is smaller than that of the global model, suggesting that the GWR model is “better”. GWR Results

Predictions from this model... Obs Y(i) Yhat(i) Res(i) X(i) Y(i) F F F F F F F If you have requested an output file, this information and the diagnostics are also written to this file.

********************************************************* * PARAMETER 5-NUMBER SUMMARIES * ********************************************************** Label Minimum Lwr Quartile Median Upr Quartile Maximum Intrcept PctPov As there are as many sets of parameter estimates as there are regression points a “5 number summary” is provided to give the main “highlights” of each distribution. As well as the maximum, minimum and median, the program prints the upper and lower quartiles.

************************************************* * * Test for spatial variability of parameters * * ************************************************* Tests based on the Monte Carlo significance test procedure due to Hope [1968,JRSB,30(3), ] Parameter p-value Intercept PctPov It is useful to know whether there is sufficient evidence to suggest that the variation in parameter estimates is not random. Two tests have been programmed: a Monte Carlo significance test and a test after Leung.

To import the results into ArcVIEW, use the Import71 utility. This creates an ArcINFO point coverage. Parameter estimates

Spatial variation in the intercept term Spatial variation in the intercept term

Spatial variation in the poverty parameter

What’s planned for GWR 3.0 ? Regression through the origin Further kernel types Poisson regression Further export options Other suggestions from users of GWR 2.0 The GWR website is: Details of program availability given