Tim Green NEFIS Analysis of partner metadata records 15 November 2004.

3 D3: Metadata standards and Keyword Lists NEFIS Interoperability and The Way Forward The extraction and analysis of data from a variety of sources in order to draw conclusions for decision-making, when carried out by a human data analyst, involves a number of stages including: Identification of appropriate sources Evaluation of those sources for relevance and reliability Extraction of required data Manipulation using tools appropriate for the required purpose Interpretation of the compiled results Presentation to the appropriate audience(s)

4 D3: Metadata standards and Keyword Lists This knowledge generation cycle of retrieval, analysis, publication and storage remains fundamentally unchanged by the widespread use of ICTs, but the mechanisms, scale, accessibility and audience reach have become very different. Data gathered for a specific purpose for a limited audience can potentially be retrieved and used for entirely different purposes or audiences. This carries both benefits and risks: consolidation of incompatible data could lead to erroneous conclusions with unpredictable results.

5 Number of metadata records from partners (total 63) NFI data: Separate metadata and data tables for each variable (3 partners) OR 1 metadata record for whole NFI dataset (4 partners) For 1 dataset there were separate metadata records for each record in the dataset (480). The same fields were filled in for all the data records. Therefore they are treated as one metadata record in this analysis.

6 Mandatory elements Something was input for the mandatory elements for all metadata records (including the refinements of these elements) Optional (desirable) elements Return for optional elements ranged from 20% (relation and its refinements) to 94% (audience)

7 Optional elements Identifier: Should more of the datasets have a specific identifier? Coverage: Should more of the datasets have some description of temporal coverage?

8 Title

9 Creator Some datasets have both organisations and individuals as the creator Acronyms of organisation names

10 Subject

11 Themes and Terms: Number varied greatly from 1 (NEFISTerms and Nominated Terms) to 83. Dependent on dataset described, but in some cases, more terms would be useful to help users find the resources. Are the Themes, Terms and Nominated Terms used consistently? Some misunderstanding of how to use Subject, Themes and Terms. Some Title repeated in the Subject. Some entries Themes not in the list Should be clarified in the guidelines. Nominated Terms: do they contain any terms (or similar terms) included in the keyword lists. Analysis needs to be done. Organism Names: why such a low number (2)? Not included in the Metadata template. In some cases this information was included in the NEFIS Terms/Description. But where species data is given, then the species name should be given in the refinement ’organism names’. Classification: Not used at all. Why? Not useful? Or lack of familiarity?

12 Description

13 NEFIS Metadata Guidelines ” Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. Guidelines for creation of content: The Description is a potentially rich source of indexable terms, and care should be taken when creating the Description. It should be clearly structured and the main contents should be described in the first paragraph. Best practice recommendation for this element is to use full sentences, as Description is often used to present information to users to assist in their selection of appropriate resources from a set of search results. Descriptive information can be copied or automatically extracted from the item if there is no abstract or other structured description available. The Description should at least contain the following information: type of resource, aims, contents, background information.”

14 Description Description: varied widely. Some comprehensive. Some basic or very basic (not adding very much information to that given in the title) In some cases a basic description would be enough, but in others a more detailed description of the resource would be helpful (essential) to someone searching EFIS in order to evaluate, extract, and interpret information in the resource.

15 Description Comprehensive example Title: Stumpage prices in Finnish non-industrial private forests ” The dataset present information on nominal stumpage prices, paid in sales of roundwood from Finnish non-industrial private forests. Prices for the following six major assortments are given: pine, spruce and birch logs; pine, spruce and birch pulpwood. The regional breakdown of data is forestry centres (14 in total). The price information dates back to 1995, and it is updated on a monthly basis. In Finland, stumpage sales is the dominant sales form in private forests. They account for approx. 80% of total roundwood sales. ”

16 Description Comprehensive example Title: Annual increment for forest types and tree species ” Annual increment estimates for tree species, forest types, counties and four periods. Productive forest land. Swedish NFI data. All trees at least 1.3 m high are included. Tree species: Pine - Scots pine (Pinus ssp excl P. contorta, Larix ssp) Spruce - Norway spruce (Picea ssp, Abies ssp) Contorta - Contorta pine (Pinus contorta) Birch - Birch (Betula pendula, B. pubescens) Other broadleaves - Other broadleaved trees, oaks, beech and other hardwood trees excluded. Principally aspen (Populus tremula), alder (Alnus incana/glutinosa), sallow (Salix caprea) and rowan (Sorbus aucuparia) Oak - Oak (Quercus robur/petraea) Beech - Beech (Fagus sylvatica) Other hardwood - Other hardwood trees defined by Swedish forestry act, principally common ash (Fraxinus excelsior), wych-elm (Ulmus glabra), lime (Tilia cordata), hornbeam (Carpinus betulus) and cherry (Prunus avium) Forest type is defined by basal area at breast height (1.3 m above ground level) percentage. Definition of forest types: Pine - At least 65 percent pine (Pinus ssp, Larix ssp) Spruce - At least 65 percent spruce (Picea ssp, Abies ssp) Mixed coniferous - At least 65 percent conifers, but not pine or spruce forest type. Mixed coniferous/broadleaved - Nor 65 percent conifers or broadleaved trees. Broadleaved - At least 65 percent broadleaved trees. Density 0 - Bare forest land with no trees”

17 Description Basic example Title: Forest map ”The forest map of Europe”

18 Description > Quality Report Comprehensive examples. Many gave references to other documentation Guidelines for data collection and processing can be found in: Study on European forestry information and communication systems. Reports on forestry inventory and information systems. Volume 2. European Commission ISBN Description in metadata record “Utilisation of international forest resources information officially published by FAO and UN-ECE/FAO. The information collected by FAO and UN-ECE/FAO is based on data questionnaire returns from designated national country correspondents. The data presented in the FAO and UN-ECE/FAO publications was transferred to electronic format and organised in an interactive Internet database …”

19 Description > Quality > Quantitative measures for NFI data 4 NFI datasets with 1 metadata record; 3 NFI datasets with separate metadata records for different variables (total 27 records) Variables reported include forest land area, standing volume, increment, number of stems (by age class, species. So although 27 metadata records, these measures are reported for the 4 variables. Some difference in understanding of requirements? 2 groups reported resampling of 1-3%. 1 group of metadata record reported resampling of 100%.

20 Description > Quality > Quantitative measures for NFI data NFI Record 1NFI Record 2 VariablenameTotal volume Variablename Availabilitystandarderror (answer yes/no) yesAvailabilitystandarderror (answer yes/no) No Standarderror % (varies between regions) Standarderror No Resampling (answer yes/no) yesResampling (answer yes/no) Yes ResamplingPercentage ca. 1%ResamplingPercentage 100 % Full sample Totalsamplesize 65859Totalsamplesize 100 % samplingunit plotsamplingunit Forest subcompartment

21 Date refinements: created, valid, available, issued, modified, dataAccepted, dataCopyrighted, dataSubmitted

22 Type 18 georeferenced datasets

23 Format 18 georeferenced datasets, but only 16 entries for reference system. Information given for reference system ranged from comprehensive to basic

24 Coverage

25 Point and Box Why is Temporal coverage not used more (46)? Spatial. Often given at very broad level (e.g. World, Europe, Finland). A more detailed listing of countries might help users find data, but more time consuming. Spatial. For datasets containing information at the subnational level, should the names of the subnational areas be given?

