Spatial Data Analysis of Areas: Regression. Introduction Basic Idea  Dependent variable (Y) determined by independent variables X1,X2 (e.g., Y = mX +

Slides:



Advertisements
Similar presentations
Multiple Regression Analysis
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
GIS and Spatial Statistics: Methods and Applications in Public Health
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Chapter 13 Multiple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Applied Geostatistics
Chapter 11 Multiple Regression.
Ch. 14: The Multiple Regression Model building
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Lecture 5 Correlation and Regression
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation,
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Introduction to Linear Regression
Lecturer: Kem Reat, Viseth, PhD (Economics)
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Examining Relationships in Quantitative Research
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Spatial Analysis & Vulnerability Studies START 2004 Advanced Institute IIASA, Laxenburg, Austria Colin Polsky May 12, 2004 Graduate School of Geography.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Examining Relationships in Quantitative Research
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
Ch. 2: The Simple Regression Model
Chapter 5 Part B: Spatial Autocorrelation and regression modelling.
Ch. 2: The Simple Regression Model
Correlation and Regression
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Our theory states Y=f(X) Regression is used to test theory.
Product moment correlation
The Simple Regression Model
Presentation transcript:

Spatial Data Analysis of Areas: Regression

Introduction Basic Idea  Dependent variable (Y) determined by independent variables X1,X2 (e.g., Y = mX + b). Uses of regression:  Description  Control  Prediction

Simple Linear Regression Y i =  0 +  1 X i +  i Y i value of dependent variable on trial i  0,  1 (unknown parameters) X i value of independent variable on trial i  i i th error term (unexplained variation), where E [  i ]=0,  2 (  i )=  2 error terms are N(0,  2 ) basic model

Y i is the i th observation of the dependent variable are parameters are observations of the ind variables are independent and normal Multiple Regression Basic Model estimated model i th residual

Sometimes we need to transform the data Predicted versus Observed Plots: (a) model with variables not transformed): R 2 = 0.61; (b) Model 7: R 2 = Scatter plots: (a) Y versus PORC3_NR (percentage of large farms in number ); (b) log10 Y versus log 10 (PORC3_NR).

Precision of estimates and fit Analysis of variation Sum of squares of Y = Sum of squares of estimate + Sum of squares of residuals Dividing both sides by TSS (sum of squares of Y): 1 = ESS/TSS + RSS/TSS where ESS/TSS = r 2 (coefficient of determination) r 2 gives the proportion of total variation “explained” by the sample regression equation. The closer is r 2 to 1.00, the better the fit.

Analysis of Residuals It is a good idea to plot the residuals against the independent variables to see if they show a trend. Possible behaviors:  Correlation (e.g., the higher the independent variable, the higher the residual)  Nonlinearity  Heteroskedacity (i.e., the variance of the residual increases or decreases with the independent variable). Regression assumes that residuals are constant variance and normally distributed.

Good Residual Plot X Y

Nonlinearity X residual

Heteroskedacity residual X

Regression with Spatial Data: Understanding Deforestation in Amazonia

The forest...

The rains...

The rivers...

Deforestation...

Fire...

Amazon Deforestation 2003 Fonte: INPE PRODES Digital, Deforestation 2002/2003 Deforestation until 2002

What Drives Tropical Deforestation? Underlying Factors driving proximate causes Causative interlinkages at proximate/underlying levels Internal drivers *If less than 5%of cases, not depicted here. source:Geist &Lambin  5% 10% 50% % of the cases

Courtesy: INPE/OBT

Courtesy: INPE/OBT

Deforestation in Amazonia PRODES (Total 1997) = km2 PRODES (Total 2001) = km2

Modelling Tropical Deforestation Fine: 25 km x 25 km grid Coarse: 100 km x 100 km grid Análise de tendências Modelos econômicos

Amazônia in 2015? fonte: Aguiar et al., 2004

Factors Affecting Deforestation

Coarse resolution: candidate models

Coarse resolution: Hot-spots map Terra do Meio, Pará State South of Amazonas State Hot-spots map for Model 7: (lighter cells have regression residual < -0.4)

Modelling Deforestation in Amazonia High coefficients of multiple determination were obtained on all models built (R 2 from 0.80 to 0.86). The main factors identified were:  Population density;  Connection to national markets;  Climatic conditions;  Indicators related to land distribution between large and small farmers. The main current agricultural frontier areas, in Pará and Amazonas States, where intense deforestation processes are taking place now were correctly identified as hot-spots of change.

Spatial regression models

Spatial regression Specifying the Structure of Spatial dependence  which locations/observations interact Testing for the Presence of Spatial Dependence  what type of dependence, what is the alternative Estimating Models with Spatial Dependence  spatial lag, spatial error, higher order Spatial Prediction  interpolation, missing values source: Luc Anselin

Nonspatial regression Objective  Predict the behaviour of a response variable, given a set of known factors (explanatory variables). Multivariate nonspatial models y k =  0 +  1 x 1k +… +  i x ik +  i  y k = estimate of response variable for object k   i = regression coefficient for factor i  x i = explanatory variable i for region k   k = random error Adjustment quality R 2 =1– (y i –y i )  i =1 n (y i –y i )  i =1 n 2 2

Nonspatial regression: hypotheses Y = X  +  (model)  Explanatory variables are linearly independent  Y - vector of samples of response variable (n x 1)  X – matrix of explanatory variables (n x k)   - coefficient vector (k x 1)   - error vector (n x 1) E(  i ) = 0 ( expected value)  i ~ N( 0,  i 2 ) (normal distribution)

Generalized linear models g(Y) = X  + U  Response is some function of the explanatory variables  g(.) is a link function  Ex: logarithm function  U = error vector  (U) = 0 (expected value)  (UU T ) = C (covariance matrix) if C=  2 I, the error is homoskedastic

Spatial regression Spatial effects  What happens if the original data is spatially autocorrelated?  The results will be influenced, showing statistical associated where there is none  How can we evaluate the spatial effects?  Measure the spatial autocorrelation (Moran’s I) of the regression residuals

Regression using spatial data Try a linear model first Adjust the model and calculate residuals Are the residuals spatially autocorrelated?  No, we’re OK  Yes, nonspatial model will be biased and we should propose a spatial model

Spatial dependence Estimating the Form/Extent of Spatial Interaction  substantive spatial dependence  spatial lag models Correcting for the Effect of Spatial Spill-overs  spatial dependence as a nuisance  spatial error models source: Luc Anselin

Spatial dependence Substantive Spatial Dependence  lag dependence  include Wy as explanatory variable in regression  y = ρWy + Xβ + ε Dependence as a Nuisance  error dependence  non-spherical error variance  E[εε’] = Ω  where Ω incorporates dependence structure

Interpretation of spatial lag True Contagion  related to economic-behavioral process  only meaningful if areal units appropriate (ecological fallacy)  interesting economic interpretation (substantive) Apparent Contagion  scale problem, spatial filtering source: Luc Anselin

Interpretation of Spatial Error Spill-Over in “Ignored” Variables  poor match process with unit of observation or level of aggregation  apparent contagion: regional structural change  economic interpretation less interesting nuisance parameter Common in Empirical Practice source: Luc Anselin

Cost of ignoring spatial dependence Ignoring Spatial Lag  omitted variable problem  OLS estimates biased and inconsistent Ignoring Spatial Error  efficiency problem  OLS still unbiased, but inefficient  OLS standard errors and t-tests biased source: Luc Anselin

Spatial regression models Incorporate spatial dependency Spatial lag model Two explanatory terms  One is the variable at the neighborhood  Second is the other variables

Spatial regimes Extension of the non-spatial regression model Considers “clusters” of areas Groups each “cluster” in a different explanatory variable y i =  0 +  1 x 1 +… +  i x i +  i Gets different parameters for each “cluster”

A study of the spatially varying relationship between homicide rates and socio-economic data of São Paulo using GWR Frederico Roman Ramos CEDEST/Brasil

Extensão of traditional regression model where the parameters are estimaded locally (u i,v i ) are the geographical coordinates of point i. The betas vary in space (each location has a different coeficient) We estimate an ordinary regression for each point where the neighbours have more weight Geographically Weighted Regression

Introducing São Paulo 30 Km 70 Km Some numbers : Metropolitan region: Population: 17,878,703 (ibge,200) 39 municipalities Municipality of São Paulo : Population: 10,434,252 HDI_M: (pnud, 2000) 96 districts IEX: 74 out of 96 districts were classified as socially excluded (cedest,2002) 4,637 homicide victims in 2001

Data 4,637 homicide victims residence geoadressed Census Sample Tracts 2000

Density surface of victim-based homicides Kernel Density Function Bandwidth = 3 Km Critical areas

Victim-based homicide rate ( Tx_homic ) Tx_homic = count homicide events (2001) * population (census, 2000)

LISA Victim-based homicide rate

Percentage of illiterate house-head ( Xanlf ) Definition House-head is the person responsible for the house. Generally, but not necessarily, who has the highest income of the house

LISA Percentage of illiterate house-head

OLS regression results for TX_homic and X_analf

Moran=0,2624 LISA for standardized residuals of the OLS regression for TX_homic and X_analf

********************************************************** * GWR ESTIMATION * ********************************************************** Fitting Geographically Weighted Regression Model... Number of observations Number of independent variables... 2 (Intercept is variable 1) Bandwidth (in data units) Number of locations to fit model Diagnostic information... Residual sum of squares Effective number of parameters Sigma Akaike Information Criterion Coefficient of Determination GWR regression results for TX_homic and Xanlf

Moran= -0,0303 GWR regression results for TX_homic and Xanlf residuals

GWR regression results for TX_homic and Xanlf Local Beta1Local t-value

CONCLUSIONS -There are significant differences in the relationship between violence rates and social territorial data over the intra-urban area of São Paulo -This results reinforces our hypotheses that we should avoid using general concepts -The GWR technique is a useful instrument in social territorial analysis