Geographically weighted regression

Slides:



Advertisements
Similar presentations
Cointegration and Error Correction Models
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Our Approach: Use a separate regression function for different regions. Problem: Need to find regions with a strong relationship between the dependent.
Analysis of variance and statistical inference.
11 Pre-conference Training MCH Epidemiology – CityMatCH Joint 2012 Annual Meeting Intermediate/Advanced Spatial Analysis Techniques for the Analysis of.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Introduction to Applied Spatial Econometrics Attila Varga DIMETIC Pécs, July 3, 2009.
STAT 497 APPLIED TIME SERIES ANALYSIS
GIS and Spatial Statistics: Methods and Applications in Public Health
More than just maps. A Toolkit for Spatial Analysis GUI access to the most frequently used tools ArcToolbox – an expandable collection of ready-to-use.
Correlation and Autocorrelation
Chapter 12 Simple Regression
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Clustered or Multilevel Data
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
Why Geography is important.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial.
IS415 Geospatial Analytics for Business Intelligence
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
F-Test ( ANOVA ) & Two-Way ANOVA
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Method.
Quantitative Methods Heteroskedasticity.
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Uneven Intraurban Growth in Chinese Cities: A Study of Nanjing Yehua Dennis Wei Department of Geography and Institute of Public and International Affairs.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Interpolation.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Regression Analysis A statistical procedure used to find relations among a set of variables.
Taking ‘Geography’ Seriously: Disaggregating the Study of Civil Wars. John O’Loughlin and Frank Witmer Institute of Behavioral Science University of Colorado.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Discussion of time series and panel models
Introduction. Spatial sampling. Spatial interpolation. Spatial autocorrelation Measure.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Correlation & Regression Analysis
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
AUTOCORRELATION 1 Assumption B.5 states that the values of the disturbance term in the observations in the sample are generated independently of each other.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Stats Methods at IC Lecture 3: Regression.
Multiple Regression.
Why Model? Make predictions or forecasts where we don’t have data.
Chapter 14 Introduction to Multiple Regression
Essentials of Modern Business Statistics (7e)
Multiple Regression.
Gerald Dyer, Jr., MPH October 20, 2016
Multiple Regression Chapter 14.
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
An Introduction to Correlational Research
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Geographically weighted regression Danlin Yu Yehua Dennis Wei Dept. of Geog., UWM

Outline of the presentation Spatial non-stationarity: an example GWR – some definitions 6 good reasons using GWR Calibration and tests of GWR An example: housing hedonic model in Milwaukee Further information

1. Stationary v.s non-stationary yi= i0 + i1x1i yi= 0 + 1x1i e1 e1 e2 e2 Stationary process Non-stationary process e4 e3 e3 e4 Assumed More realistic

Simpson’s paradox Spatially aggregated data Spatially disaggregated data House Price House density House density

Stationary v.s. non-stationary If non-stationarity is modeled by stationary models Possible wrong conclusions might be drawn Residuals of the model might be highly spatial autocorrelated

Why do relationships vary spatially? Sampling variation Nuisance variation, not real spatial non-stationarity Relationships intrinsically different across space Real spatial non-stationarity Model misspecification Can significant local variations be removed?

2. Some definitions Spatial non-stationarity: the same stimulus provokes a different response in different parts of the study region Global models: statements about processes which are assumed to be stationary and as such are location independent

Some definitions Local models: spatial decompositions of global models, the results of local models are location dependent – a characteristic we usually anticipate from geographic (spatial) data

Regression Regression establishes relationship among a dependent variable and a set of independent variable(s) A typical linear regression model looks like: yi=0 + 1x1i+ 2x2i+……+ nxni+i With yi the dependent variable, xji (j from 1 to n) the set of independent variables, and i the residual, all at location i

Regression When applied to spatial data, as can be seen, it assumes a stationary spatial process The same stimulus provokes the same response in all parts of the study region Highly untenable for spatial process

Geographically weighted regression Local statistical technique to analyze spatial variations in relationships Spatial non-stationarity is assumed and will be tested Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related

GWR Addresses the non-stationarity directly Allows the relationships to vary over space, i.e., s do not need to be everywhere the same This is the essence of GWR, in the linear form: yi=i0 + i1x1i+ i2x2i+……+ inxni+i Instead of remaining the same everywhere, s now vary in terms of locations (i)

3. 6 good reasons why using GWR GWR is part of a growing trend in GIS towards local analysis Local statistics are spatial disaggregations of global ones Local analysis intends to understand the spatial data in more detail

Global v.s. local statistics Global statistics Similarity across space Single-valued statistics Not mappable GIS “unfriendly” Search for regularities aspatial Local statistics Difference across space Multi-valued statistics Mappable GIS “friendly” Search for exceptions spatial

6 good reasons why using GWR Provides useful link to GIS GISs are very useful for the storage, manipulation and display of spatial data Analytical functions are not fully developed In some cases the link between GIS and spatial analysis has been a step backwards Better spatial analytical tools are called for to take advantage of GIS’s functions

GWR and GIS An important catalyst for the better integration of GIS and spatial analysis has been the development of local spatial statistical techniques GWR is among the recently new developments of local spatial analytical techniques

6 good reasons why using GWR GWR is widely applicable to almost any form of spatial data Spatial link between “health” and “wealth” Presence/absence of a disease Determinants of house values Regional development mechanisms Remote sensing

6 good reasons why using GWR GWR is truly a spatial technique It uses geographic information as well as attribute information It employs a spatial weighting function with the assumption that near places are more similar than distant ones (geography matters) The outputs are location specific hence mappable for further analysis

6 good reasons why using GWR Residuals from GWR are generally much lower and usually much less spatially dependent GWR models give much better fits to data, EVEN accounting for added model complexity and number of parameters (decrease in degrees of freedom) GWR residuals are usually much less spatially dependent

6 good reasons why using GWR GWR as a “spatial microscope” Instead of determining an optimal bandwidth (nearest neighbors), they can be input a priori A series of bandwidths can be selected and the resulting parameter surface examined at different levels of smoothing (adjusting amplifying factor in a microscope)

6 good reasons why using GWR GWR as a “spatial microscope” Different details will exhibit different spatial varying patterns, which enables the researchers to be more flexible in discovering interesting spatial patterns, examining theories, and determining further steps

4. Calibration of GWR Local weighted least squares Weights are attached with locations Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related than remote ones

Weighting schemes Determines weights Most schemes tend to be Gaussian or Gaussian-like reflecting the type of dependency found in most spatial processes It can be either Fixed or Adaptive Both schemes based on Gaussian or Gaussian-like functions are implemented in GWR3.0 and R

Fixed weighting scheme Weighting function Bandwidth

Problems of fixed schemes Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters)

Adaptive weighting schemes Weighting function Bandwidth

Adaptive weighting schemes Adaptive schemes adjust itself according to the density of data Shorter bandwidths where data are dense and longer where sparse Finding nearest neighbors are one of the often used approaches

Calibration Surprisingly, the results of GWR appear to be relatively insensitive to the choice of weighting functions as long as it is a continuous distance-based function (Gaussian or Gaussian-like functions) Whichever weighting function is used, however the result will be sensitive to the bandwidth(s)

Calibration An optimal bandwidth (or nearest neighbors) satisfies either Least cross-validation (CV) score CV score: the difference between observed value and the GWR calibrated value using the bandwidth or nearest neighbors Least Akaike Information Criterion (AIC) An information criterion, considers the added complexity of GWR models

Tests Are GWR really better than OLS models? An ANOVA table test (done in GWR 3.0, R) The Akaike Information Criterion (AIC) Less the AIC, better the model Rule of thumbs: a decrease of AIC of 3 is regarded as successful improvement

Tests Are the coefficients really varying across space F-tests based on the variance of coefficients Monte Carlo tests: random permutation of the data

5. An example Housing hedonic model in Milwaukee Data: MPROP 2004 – 3430+ samples used Dependent variable: the assessed value (price) Independent variables: air conditioner, floor size, fire place, house age, number of bathrooms, soil and Impervious surface (remote sensing acquired)

The global model

The global model 62% of the dependent variable’s variation is explained All determinants are statistically significant Floor size is the largest positive determinant; house age is the largest negative determinant Deteriorated environment condition (large portion of soil&impervious surface) has significant negative impact

GWR run: summary Number of nearest neighbors for calibration: 176 (adaptive scheme) AIC: 76317.39 (global: 81731.63) GWR performs better than global model

GWR run: non-stationarity check F statistic Numerator DF Denominator DF* Pr (> F) Floor Size 2.51 325.76 1001.69 0.00 House Age 1.40 192.81 1 001.69 Fireplace 1.46 80.62 0.01 Air Conditioner 1.23 429.17 Number of Bathrooms 2.49 262.39 Soil&Imp. Surface 1.42 375.71 Tests are based on variance of coefficients, all independent variables vary significantly over space

General conclusions Except for floor size, the established relationship between house values and the predictors are not necessarily significant everywhere in the City Same amount of change in these attributes (ceteris paribus) will bring larger amount of change in house values for houses locate near the Lake than those farther away

General conclusions In the northwest and central eastern part of the City, house ages and house values hold opposite relationship as the global model suggests This is where the original immigrants built their house, and historical values weight more than house age’s negative impact on house values

6. Interested Groups GWR 3.0 software package can be obtained from Professor Stewart Fotheringham stewart.fotheringham@MAY.IE GWR R codes are available from Danlin Yu directly (danlinyu@uwm.edu) Any interested groups can contact either Professor Yehua Dennis Wei (weiy@uwm.edu) or me for further info.

Interested Groups The book: Geographically Weighted Regression: the analysis of spatially varying relationships is HIGHLY recommended for anyone who are interested in applying GWR in their own problems

Acknowledgement Parts of the contents in this workshop are from CSISS 2004 summer workshop Geographically Weighted Regression & Associated Statistics Specific thanks go to Professors Stewart Fotheringham, Chris Brunsdon, Roger Bivand and Martin Charlton

Questions and comments Thank you all Questions and comments