Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sugar Cane Production in Puerto Rico, 1958/59- 1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space- Time Datasets.

Similar presentations


Presentation on theme: "Sugar Cane Production in Puerto Rico, 1958/59- 1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space- Time Datasets."— Presentation transcript:

1 Sugar Cane Production in Puerto Rico, 1958/59- 1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space- Time Datasets by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences

2 ABSTRACT Researchers increasingly are accounting for heterogeneity in their empirical analyses. When data form a short time series—too short to utilize an ARIMA model—a random effect term can be employed to account for serial correlation. When data also are georeferenced, forming a space-time dataset, a random effect term can be included that is spatially structured in order to account for spatial autocorrelation, too. But space-time heterogeneity can be accounted for in various ways, including specifications involving recently developed spatial filtering methodology. This paper summarizes comparisons of four model specifications—simple pooled space-time; sequential, comparative statics; temporally varying coefficients with a spatially unstructured random effect; and, temporally varying coefficients with a spatially structured random effect—illustrating implementations with annual sugar cane production data for the 73 municipalities of Puerto Rico during 1958/59-1973/74. Covariates whose importance is assessed include elevation and distance from the primate city.

3 Panel data versus space-time data Panel data are a form of longitudinal data, and can be a cross-section (i.e., the spatial dimension) of individuals (e.g., farms) that are surveyed periodically over a given time horizon. With repeated observations of the same individuals, panel data permit a researcher to study the dynamics of change with short time series. A main advantage of panel data: controlling for unobserved heterogeneity (the fundamental complication of non-experimental data collection) BUT longitudinal data need not involve the same individuals: if a sample is not the same, observed changes also may result from sampling error

4 Spatial filtering A given random variable can be decomposed into a spatial component and an aspatial component: impulse-response function approach (based upon the autoregressive model), Getis approach (based on the K function), eigenfunction spatial filtering approach. The spatial component relates to spatial autocorrelation

5 High Peak district biomass index: ratio of remotely sensed data spectral bands B 3 and B 4 Spatially autocorrelatedGeographically random

6 Defining spatial autocorrelation Auto: self Correlation: degree of relative correspondence Positive: similar values cluster together on a map Negative: dissimilar values Cluster together on a map

7 Spatial auto- correlation from r to MC

8 Constructing eigenfunctions for filtering spatial autocorrelation out of georeferenced variables: Moran Coefficient = (n/1 T C1) x Y T (I – 11 T /n)C (I – 11 T /n)Y/ Y T (I – 11 T /n)Y the eigenfunctions come from (I – 11 T /n)C (I – 11 T /n)

9 Eigenvectors for spatial filter construction The first eigenvector, say E 1, is the set of real number numerical values that has the largest MC achievable by any set for the spatial arrangement defined by the geographic connectivity matrix C. The second eigenvector is the set of values that has the largest achievable MC by any set that is uncorrelated with E 1. The third eigenvector is the third such set of values. And so on. This sequential construction of eigenvectors continues through E n, the set of values that has the largest negative MC achievable by any set that is uncorrelated with the preceding (n-1) eigenvectors.

10 Useful citation

11 Random effects model is a random observation effect (differences among individual observational units) is a time-varying residual error (links to change over time) The composite error term is the sum of the two.

12 Random effects model: normally distributed intercept term ~ N(0, ) and uncorrelated with covariates supports inference beyond the nonrandom sample analyzed simplest is where intercept is allowed to vary across areal units (repeated observations are individual time series) The random effect variable is integrated out (with numerical methods) of the likelihood fcn accounts for missing variables & within unit correlation (commonality across time periods)

13 Sugar cane production in Puerto Rico Began in the 1530s Experienced a sharp decline during 1580-1650 Introduction of slave labor resulted in considerable expansion during 1765-1823 By 1828, sugar exports were sizeable Spanish monarchy discouraging expansion throughout much of the 1800s United States took possession of the island in 1899, fully developing the long-demanded railroad on the island and channeling considerable investment into sugar cane production, achieving maximum expansion in the 1920 Production peaked around 1950

14 Island-wide time series US intervention

15 1924 sugar cane railroad Finally started by the Spanish Crown, but aggressively completed by US investors

16 Covariates of sugar cane productionelevation distance from San Juan covariate spatial filters

17 Model specifications I-A: initial I-B: with linear time trend II: with random effect

18 III: with spatial filter IV: with spatially structured random effect

19 Sugar cane production: 1958/59-1973/741958/59 1963/64 1968/691973/74 Scale Dark red: high Dark green: low

20 YearcovariatesDeviancePseudo-R 2 MC for %Residual MC 1958/59 Time-based intercept, mean elevation, Distance from San Juan 15650.5030.319680.04912 1959/6015030.5270.333170.05521 1960/6115610.5400.357510.06663 1961/6215430.5590.386690.08844 1962/6314900.5760.415710.10887 1963/6415440.5790.422720.10598 1964/6514670.5990.461010.12206 1965/6615230.5860.483830.16313 1966/6716100.5710.490180.18957 1967/6816010.5450.474200.17009 1968/6912590.6200.538510.17194 1969/7012730.5740.474480.13531 1970/7111490.5180.430490.18262 1971/7211640.5480.432070.12463 1972/7311460.4770.428750.19466 1973/74 8990.5660.395130.04261

21 Year Spatially unstructuredSpatially structured Deviance statistic Pseudo-R 2 Residual MCSelected vectorsDeviance statistic Pseudo-R 2 Residual MC 58/594730.8810.33271 E 3, E 4, E 6, E 7, E 8, E 13, E 18 3780.957-0.02771 59/604030.9060.34707 E 3, E 4, E 6, E 7, E 8, E 13, E 18 3210.975-0.07181 60/613680.9380.31433 E 1, E 3, E 4, E 6, E 7, E 8, E 13, E 18 3260.982-0.03271 61/623030.9610.33815 E 3, E 4, E 6, E 7, E 11 2790.9880.03076 62/632610.9830.19217E4E4 2520.9920.17739 63/642810.9860.14692E 1, E 4 2710.9930.09181 64/652630.9840.17054E 3, E 4 2540.9890.07083 65/662660.9860.22023E3E3 2540.9880.04146 Mixed binomial regression: time varying covariate coefficients, spatially unstructured and structured random effects

22 Year Spatially unstructuredSpatially structured Devi- ance Pseudo-R 2 Residual MCSelected vectorsDevi- ance Pseudo-R 2 Residual MC 66/673020.9770.33270E 3, E 6, E 8 2730.9850.09299 67/683290.9640.28672E 1, E 3, E 4, E 6, E 8 2900.9760.03851 68/693200.9660.30690 E 1, E 3, E 4, E 5, E 6, E 8, E 12, E 13, E 14, E 16 2180.981-0.08747 69/703100.9560.19651 E 1, E 2, E 3, E 4, E 6, E 8, E 11, E 16, E 18 2500.976-0.03816 70/713390.9140.34359 E 1, E 3, E 4, E 6, E 7, E 8, E 11, E 15, E 18 1810.979-0.04857 71/723840.8930.14420 E 1, E 2, E 3, E 4, E 5, E 6, E 8, E 9, E 10, E 11, E 12, E 16, E 17, E 18 2070.965-0.12290 72/734270.8060.24568 E 1, E 2, E 3, E 4, E 6, E 8, E 9, E 10, E 11, E 12, E 13, E 16, E 17, E 18 1580.964-0.13529 73/743470.9060.07071 E 1, E 2, E 3, E 4, E 6, E 8, E 9, E 10, E 11, E 12, E 18 1670.945-0.07292

23 Spatial filters for space-time spatially structured random effects1958/59 MC = 0.77, GR = 0.30 1963/64 MC = 0.93, GR = 0.18 1968/69 MC = 0.86, GR = 0.18 1973/74 MC = 0.94, GR = 0.22

24 (normally distributed) random intercept: areal unit specific across all years featureSpatially unstructuredAdded to spatial structure Sample mean-0.00864-0.00665 Sample variance 1.630441.63797 Moran Coefficient (MC) 0.086720.08778 Geary Ratio (GR)1.101961.09907 P(Shapiro-Wilk)< 0.0001 (4 lower tail outliers) Correlations with covariates (-0.17873, 0.32086)(-0.17833, 0.32095)

25 Time series plots: intercept & covariate binomial regression coefficientsintercept ● simple pooled model ■ comparative static model ♦ model with a spatially unstructured random effect ▲ mixed model with spatially structured random effect mean elevation distance

26 Time series plots: covariate binomial regression coefficient standard errors mean elevation distance ●simple pooled model ■ comparative static model ♦ model with a spatially unstructured random effect ▲ mixed model with spatially structured random effect

27 Residual serial correlation The random effects estimator approximates the degree of serial correlation (or its importance in the model), and hence allows the computation of corrected estimates. The 73 residual Durbin-Watson statistics have a range of (0.140, 2.513), with a mean of 0.836 and a standard deviation of 0.546. Determining significance here is complicated because of small T, inclusion of a random effects term, and variable SF eigenvecvtor #s

28 Graphical portrayal of DWs GLM residuals (heuristic using 4 dfs lost) 0 – 0.74 1.93 – 2.08 3.26 – 4 0.74 – 1.93 2.07 – 3.26 undecided positive serial correlation

29 Summary of results

30 STAR-binomial specification time space space-time

31 Pseud- & quasi-likelihood estimation

32 Extra binomial variation remains 1958/59 1565473378 1959/60 1503403321 1960/61 1561368326 1961/62 1543303279 1962/63 1490261252 1963/64 1544281271 1964/65 1467263254 1965/66 1523266254 1966/67 1610302273 1967/68 1601329290 1968/69 1259320218 1969/70 1273310250 1970/71 1149339181 1971/72 1164384207 1972/73 1146427158 1973/74 899347167 ●pineapple production ■ milk production ♦ sugar cane production ▲ tobacco production

33 implications 1.spatial autocorrelation appears to be a source of part of the overdispersion 2.random effects (e.g., missing covariates) appear to be a source of part of the overdispersion 3.land use competition may be a source of part of the overdispersion 4.spatial filters for mean elevation and distance have six eigenvectors in common; of these, one is shared with most of the annual comparative static spatial filters, and two with most of the spatially structured random effect term spatial filters

34 5.the components of spatial autocorrelation in sugar cane production vary over time 6.a spatially unstructured random effect term that seeks to account for serial correlation in multiple short time series can better highlight latent spatial autocorrelation 7.a spatial filter can effectively structure a random effect term 8.failure to include a spatially structured random effect term can result in biased parameter estimates (largely because of the nonlinear nature of the model specification) 9.spatial and temporal autocorrelation interact in a complex way

35 THE END


Download ppt "Sugar Cane Production in Puerto Rico, 1958/59- 1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space- Time Datasets."

Similar presentations


Ads by Google