Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Data Partitioning Scheme for Spatial Regression

Similar presentations


Presentation on theme: "A Data Partitioning Scheme for Spatial Regression"— Presentation transcript:

1 A Data Partitioning Scheme for Spatial Regression
S. Vucetic1, T. Fiez2 and Z. Obradovic1, 1 School of Electrical Engineering and Computer Science 2 Department of Crop and Soil Sciences Washington State University, Pullman

2 Purpose First step in training is to partition available data randomly into training, validation and test subsets Spatial data are a collection of variables whose dependence is strongly tied to a spatial location observations close to each other are more likely to be similar than observations widely separated in space. Training (e.g. using neural networks) on spatial data is likely to produce spatially correlated errors.

3 Goal Large amounts of agricultural spatial data are gathered using global positioning systems. Objective - To predict wheat yield from spatial attributes in order to extrapolate this knowledge to different agricultural sites, or to the same site but in different years. Goal - To design an appropriate procedure for training neural networks and to properly estimate their extrapolation properties

4 Method Choice of test subset
To decrease the influence of spatial proximity and estimate the predictor accuracy, the test subset should be spatially separated from the training data WEST EAST

5 Method Use of a randomly selected validation subset will result in overfitting The training portion of the field is partitioned into squares of size MM Half of these squares are randomly assigned for use in training and the rest for validation

6 BLACK – training subset GRAY – validation subset
Figure Spatial partitioning of east half of wheat field into squares of size M=100m. Black and gray squares represent training and validation data while white represents missing data

7 Method Choice of block size, M
M should be sufficiently large to minimize the influence of spatial correlation between training and validation data, M should be small enough to provide a training set representative of the whole field. Solution Select M to be within a range where correlograms (plot of the correlation coefficient as a function of the separation distance between data points) of all features approach zero.

8 Method Spatial bagging
Train a number of neural networks for different random assignments of squares into training and validation subsets Final prediction is average prediction from all constructed neural networks.

9 Figure 3. Histograms of 8 topographic features
Data Set Data set - A precision agriculture database containing a 10 x 10 m grid of 8 topographic attributes and winter wheat yield totaling 24,592 data points from a 220 ha field located near Pullman, WA Figure Histograms of 8 topographic features Figure Correlograms for variables whose histograms are shown in Figure 3 Distance [m]

10 Results Comparison of Random Testing (RTS) vs Spatially Disjoint Testing (SDT) MODEL MSE Train RTS SDTS Mean predictor 369 366 390 Linear 322 317 371 NN 5 hidden nodes 253 267 370 NN 15 hidden nodes 218 238 375 Random testing is missleading

11 Results Influence of block size on prediction
MODEL MSE EPOCHS Mean predictor 390 - Linear 370  4 NN 5 hidden nodes M = 10m M = 40m M = 100m M = 200m 37114 14731 36517 4319 36515 3014 36421 2510 M determined by correlogram range, M=[40, 200] m, resulted in better MSE and shorter training time

12 Results Comparison between bagging and spatial bagging
METHOD MSE NN 5 hidden nodes NN 15 hidden Bagging 357 351 Spatial Bagging M = 40m M = 100m M = 200m 344 348 342 339 Spatial bagging results in better prediction The margin between the best models and a trivial predictor was fairly small

13 Conclusions Spatial partitioning into training and validation subsets can lead to substantial improvements in learning speed, and increased predictability, as compared to the traditional random partitioning. Correlograms can be used to determine the parameters of such spatial data partitioning. Spatial bagging can improve generalization capabilities as compared to bagging.


Download ppt "A Data Partitioning Scheme for Spatial Regression"

Similar presentations


Ads by Google