Presentation is loading. Please wait.

Presentation is loading. Please wait.

Edoardo PIZZOLI, Chiara PICCINI NTTS 2013 - New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION.

Similar presentations


Presentation on theme: "Edoardo PIZZOLI, Chiara PICCINI NTTS 2013 - New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION."— Presentation transcript:

1 Edoardo PIZZOLI, Chiara PICCINI NTTS 2013 - New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION FOR POLICY ANALYSIS Brussels, 5-7 March 2013

2 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Hypothesis Statistical units bring information on the territory they belong to: their individual characteristics are proxies of the territorial characteristics. Available data at the statistical units level can be used to expand information on administrative areas as soon as there is empirically a space correlation in the studied phenomena. If such a statistical relationship exists among a set of units with respect to a specific characteristic, a spatial analysis is possible. A representation on a map or cartogram will be always considered to represent in space the phenomena under investigation. Introduction/1

3 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Geostatistics Usually applied for the analysis of natural phenomena. The main assumptions are: point values are influenced by the values in nearest points (spatial autocorrelation) and the phenomenon is continuously distributed over the territory. Socio-economic variables, discrete by nature, can be elaborated by geostatistical methods assuming both that their values are spatially dependent, and that the analyzed phenomenon is continuously distributed over the territory. The basic hypothesis is that the analyzed units can represent measurement points of phenomena distributed over the whole territory. The graphical result is certainly better than a thematic cartogram which forces the data into administrative boundaries, often arbitrarily set. Introduction/2

4 Spatial data each data value is associated with a location in space and there is at least an implied connection between the location and the data value Spatial autocorrelation and geostatistics The spatial autocorrelation is defined as the variation of a property within a geo-space: characteristics at proximal locations appear to be correlated, either positively or negatively. Spatial autocorrelation is the matter of geostatistics. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

5 Spatial prediction models (algorithms) Arbitrary or empirical models. No estimate of model error. e.g. Thiessen polygons, Inverse distance weighting, Trend surfaces, Splines. More primitive and often suboptimal; in some situations they can perform as good as the statistical methods or even better. Model parameters estimated in an objective way. Estimate of the prediction error available. e.g. Kriging, Environmental correlation, Bayesian-based models, mixed models. Input dataset usually need to satisfy strict statistical assumptions. Statistical Deterministic Geostatistical modelling NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

6 Basic steps of geostatistical analysis: 1. estimation of semivariogram 2. estimation of the parameters of the semivariogram model 3. estimation of the surface. A validation of the estimation can be added. Kriging algorithm is an optimal interpolator - generates best linear unbiased estimate at each location, employing semivariogram model. The most commonly used kriging algorithm is the Ordinary Kriging (OK). A normal distribution of the data is usually a prerequisite for the application of geostatistics: OK may give unacceptable results if the data are severely non-normal. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

7 Indicator kriging (IK) is another geostatistical approach to geospatial modeling, which makes no assumption of normality and is essentially a non-parametric counterpart to OK. IK uses indicator (0 or 1) variables to generate probabilities that a critical value was exceeded or not at each location in the study area, and then proceeds the same as OK. If a threshold is used to create the indicator variable, the resulting interpolation map would show the probabilities of exceeding (or being below) the threshold. This allows to produce probability maps and risk maps. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

8 The problem of outliers Outliers may provide some useful information concerning the magnitude of the phenomenon; at the same time may unduly influence the results of the analysis. Possible solutions:  logarithmic transformation;  winsorization;  trimming. A logarithmic transformation allows to approach a normal distribution, but on the other hand flattens out the data, completely loosing the information brought by outliers. In a trimmed dataset, the extreme values are discarded; in a Winsorized dataset, the extreme values are instead replaced by other values. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

9 Data from ASIA-Agricoltura (year 2009). Process of representing data is as follows: 1.specification of the territorial administrative level of interest; 2.normalization of the units’ addresses (data quality control); 3.assignment of geographical coordinates to units, starting from addresses; 4.correction of errors on coordinates; 5.mapping point location, visualizing their spatial distribution; 6.variographic analysis of data; 7.estimation in non-sampled points by means of the appropriate algorithm; 8.mapping estimated values over the territory; 9.cross-validation to test the accuracy of the estimation. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Application example - the Province of Palermo

10 Results/1 Revenues/number of employees ratio (R/N) NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 a) original dataset b) logarithmic transformation c) winsorized dataset d) trimmed dataset

11 Results/2 Deterministic interpolator: the Radial Basis Function (RBF) NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

12 The semivariogram is a function that describes the differences between samples separated by varying distances. In both cases, the spatial autocorrelation is weak, but a model can be adjusted to the point semivariograms, Spherical for R/N and Exponential for log(R/N). Their parameters can be used in the OK algorithm to estimate maps. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Variogram modelling a) R/N b) log(R/N)

13 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/1 Estimated map of R/N by OK Prediction error map of R/N by OK

14 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/2 Estimated map of log(R/N) by OK Prediction error map of log(R/N) by OK Contour lines arrangement looks very flattened out towards low values.

15 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/3 Estimated map of R/N by OK (winsorized dataset) Prediction error map of R/N by OK (winsorized dataset) The value of the outlier has been lowered.

16 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/4 Estimated map of R/N by OK (trimmed dataset) Prediction error map of R/N by OK (trimmed dataset) Deleting the outlier causes a loss of information. The estimation error is high everywhere.

17 The variable R/N was transformed in an indicator variable using as threshold the value 26 (proposed by the software). The spatial autocorrelation is almost absent, but an Exponential model can be adjusted to the point semivariogram. Its parameters are used in the Kriging algorithm to estimate a probability map. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Indicator Kriging/1 Indicator variable

18 NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Probability of exceeding the threshold Prediction error map Indicator Kriging/2

19 Estimated maps of socio-economic indicators, based on available micro-data at the statistical units level, represent a clear enhancement on the dissemination of statistical information. Visualizing statistical information on a map is not just a further dimension - the space - added to the data, but an improvement of socio-economic information linking them to the geographical characteristics. Maps are useful to identify areas of policy intervention, to plan and evaluate actions, to perform simulations and to get future scenarios. The error map is a further improvement, showing the limitations and the reliability of the statistical information available. The choice of the interpolation method depends essentially upon the type of available data and upon the objective of the elaboration. Brussels, 5-7 March 2013 Conclusioni

20 Brussels, 5-7 March 2013 Thank you for your attention


Download ppt "Edoardo PIZZOLI, Chiara PICCINI NTTS 2013 - New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION."

Similar presentations


Ads by Google