Presentation on theme: "Grids and GeoStatistics"— Presentation transcript:
1Grids and GeoStatistics A work in processRoutines for deliveringproducts to thenational spatial data interest/infrastructure communityThe European Forum for Geostatistics workshop in Bled,Slovenia 1rst – 3rd October 2008The next steptowards an integration ofstatistics and geography for sustainable developmentVilni Verner Holst BlochMSc. landscape ecology and natural resourcesStatistics NorwayOtervegen 23N KongsvingerTel : ++47 /Fax : ++47 /Session 2:Concept for an integrated web solution /An infrastructure for geostatistics.(The Subproject 3)
2Overview of the presentation Background – introductionDefining the gridMaking guidelines for geostatisticsStandards and metadataConfidentiality and qualityFurther work and deliveries
3Background Partnership in Norway Digital (obligations) Expressed needs from partnersStatistics NorwayRegisters on address levelPopulationBuildingsGround PropertiesSport facilitiesFarmsBusinessesAnd many more
4BackgroundDownload page for Norway Digital partners
5Background Formalising a national grid for geostatistics Internal drive within Statistics Norway (coming censuses, preparations for a geodatabase)The Norwegian geospatial data infrastructure (Norge digitalt) was established in All public institutions producing spatial data, including Statistics Norway, are members of the infrastructure.Statistics Norway is the main provider of demographic and socio-economic data in Norway. Statistics Norway therefore has a special responsibility to ensure the compatibility between social science data and the traditional, topographic and physical data provided by other partners in the infrastructure. A standardized grid is one way to provide such compatibility.By setting the SSBgrid as a standard, Statistics Norway intends to provide a common reference frame for spatial grid data for use in Norge digitalt and thus promote further integration of data from the different partners in the spatial infrastructure.
6Defining the gridSSBgrid is an open-ended definition of a family of spatial tessellation models for use in Norway.The models are all built with quadratic grid cells.The naming convention of the grids is to use the grid cell size, measured as the length of a side of a grid cell in meters or kilometers concatenated to the capital letters ‘SSB.cell size is also denoted K, and with either m or km to indicate length measure.
7Defining the grid A system of standard grids Grid name Cell size Number of cellsSSB100m (1) 0.01 km2 Approx. 35 cellsSSB250m (1) km2 Approx. 5 cellsSSB500m (1) 0.25 km2 Approx. 1 cellsSSB1km 1 km2 Approx. 350 000 cellsSSB5km 25 km2 Approx. 15 000 cellsSSB10km 100 km2 Approx. 15 000 cellsSSB25km (2) 625 km2 Approx. 500 cellsSSB50km (2) km2 Approx. 150 cellsSSB100km (2) km2 Approx. 40 cellsSSB250km (2) km2 Approx. 10 cellsSSB500km (2) km2 Approx. 4 cellsBecause of limitations in many software packages and for practical use,these grids are recommended as grids with a county coverage.(2) These grids might also cover sea territories. Number of cells refers to coverage of Norwegian mainland.One has however to be aware of deviations in grid cell areas for regions remote from the Norwegian mainland and Svalbard.
8Defining the grid Identification The identification (ID) of a grid cell with its southwestern corner at [XC,YC] in an SSBgrid with grid cell size K isComputational efficiencySince a false easting is needed to compute XC and YC for locations along the western coast of Norway, the false easting used in the ID should be also used for this purpose when the ID is calculated directly from point coordinates:CommentsEvery grid cell in an SSBgrid has a unique, 14 digit ID. The first seven digits is the east coordinate of the southwestern (lower left) corner of the grid cell (as meter coordinate in UTM33/EUREF89) with an additional false easting of 2 The false easting is added in order to avoid problems generated by negative east coordinates and is added to the standard false easting of the UTM coordinate. The last seven digits is the north coordinate of the southwestern (lower left) corner of the grid cell (as meter coordinate in UTM33/EUREF89).The ID is the key element of the SSBgrid system. Due to the ID, SSBgrid data can be distributed as tables instead of spatially organized “raster” data. Several datasets using the same SSBgrid can easily be joined together (using the ID as the key) and it facilitates manipulation and analysis of data with standard statistical or tabular data processing tools (e.g EXCEL or SPSS).
11Defining the grid Projection and datum All SSBgrids are defined in UTM33/EUREF89.ExtentionSSBgrids have no definite extention.Exchange formatSSBgrids have no particular exchange format.CommentsGrids can be drawn in other projections, but the grid cells will then not appear as quadratic squares. The “origo” of the grid is the point where the 15° east meridian is crossing the equator. The “x” coordinate for this point is, however [ ,0] (and not [0,0]) because a false easting og 500 000 is added by the definition of the UTM system and another false easting of 2 000 000 is added by the definition of the SSBgrid in order to avoid using negative numbers in the identification of the grid cells.To cover Norway, it is recommended to start at [ , ] and include an area reaching 1 200 000 meters eastward and 1 500 000 meters northward.SSBgrid data are most conveniently exchanged using simple tabular formats (text files, EXCEL spreadsheet, SAS or SPSS data files) where each grid cell is represented as a row in the table. A dataset must include the ID of each grid cell and can contain any number of additional data.
12Organisation – responsibility and contact points S320 Population statisticsPaul Inge Severeide (Head of division)Gjermund Nygårdseter (GIS contact )S410 Enterprise registerJan Ole Furseth (Head of division)Beate Bartsch (GIS contact )S430 Agricultural statisticsOle Osvald Moss (Head of division)Anne Snellingen Bye (GIS contact )S460 Construction and services statisticsRoger Jensen (Head of division)Birgit Bjørnsgard (GIS contact )Skiller ikke mellom arealbruk og arealdekke.Endringer ofte viktigere enn statustall.
14Confidentiality rules EnterprisesNo lower limitPopulation1 – 9 => 5BuildingsFarmsDue to the Statistical Act § 2-6, Statistics Norway as a main rule does not publish tables containing fewer than 3 units in a group (table cell) in which the sampling method[F1] can allowidentification of individuals.Grid cells are table cells in the sense used in the Statistical Act. In a grid map of population using 1x1 square km grid cells, a number of grid cells will have one or two observations. In Norway, electing this grid cell size will result in about 3 400 and 3 800 cells with 1 or 2 persons respectively (6-7 per cent of the approximately 55 500 inhabited grid cells in Norway). In comparison there are about 13 400 inhabited basic regional units (grunnkretser). The 7 200 cells with either 1 or 2 persons constitutes 12.8 per cent of the inhabited grid cells.These cells with 1 or 2 observations are unevenly distributed between urban and rural areas. In the sparsely populated northernmost county of Finnmark (see figure), almost every fifth inhabited cell (20%) has fewer than three persons. Only 7% of the cells have fewer than three inhabitants in the more densely populated county of Akershus.Confidentiality issues also arise when other geographical information is combined with the grid net. For example, combining the grid with digital municipal borders or a digital road network will give information on where the cells are located. If these grid cells are to be anonymous, several issues need to be considered. Must the sum of values from the grid give the correct population total for the country, or is it sufficient to display all inhabited cells without necessarily giving the exact number for every cell? Is three observations per grid cell an acceptable lowest value in terms of maintaining anonymity, or should the threshold value be increased?The board of confidentiality at Statistics Norway was asked (Ottestad, 2006) to consider various methods for handling confidentiality and to set criteria for disclosure control. The following methods were discussed:Suppression methodHeldal’s methodLeast value methodLarger grid cellsClustering methodAverage methodThe board of confidentiality recognized that all of these methods might be used, but that they would produce different results. Methods which do not display all inhabited grid cells were considered to be poor solutions from a user perspective. Changing the grid size and/or shapes would also reduce user friendliness. Setting limits or cut-off values for individuals presents challenges with regard to households. When variables other than residents are used (for example households) one should consider setting a threshold for the number of inhabited addresses in each grid cell. At present we lack a satisfactory overview of households in Norway, and hence cannot set limits for the number of households per grid cell. An alternative would be to set the threshold number of individuals per cell so high, that it normally would include more than one household.Viewed as an isolated piece of information, presentation of the fact that there is one resident in a square kilometre is not in conflict with confidentiality rules. However, later publication of other socio-economic variables or information about people in grid cells where the number of persons is low would be in conflict with the confidentiality rules.The board of confidentiality at Statistics Norway concluded that publication of population statistics (number of persons) on a one square kilometre grid should not include exact values for grid cells containing fewer than 10 persons. Statistics Norway will therefore use the following values for grid population statistics: 0, 1-9, 10, 11, 12 and so on. For grid cells with 1-9 persons the value is set to 5.Population by 1 square kilometre grid. Finnmark county
15MetadataTask: To transfer existing metadata to an xml-document