Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Similar presentations


Presentation on theme: "INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado."— Presentation transcript:

1 INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado

2 The water information value ladder Monitoring Collation Quality assurance Aggregation Analysis Reporting Forecasting Distribution Done poorly Done poorly to moderately Sometimes done well, by many groups, but could be vastly improved >>> Increasing value >>> Integration Data >>> Information >>> Insight Slide Courtesy CSIRO, BOM, WMO, Ilya, Dozier

3 Provenance and transparency

4

5 CZOs as platforms for research Integrating satellite & ground measurements with modeling CZO measurements provide the basis for advances in multiple Earth sciences CZOs are DATA-RICH places to develop & test Earth system models

6 Challenges to CZO Data Management Atmosphere Biosphere Hydrosphere Lithosphere Many Object & Data Types! Diverse media Sensor-based Stationary Mobile Spectra/photos Sample-based Sub-samples Preparations/Fractions Numeric & Categorical Hillslope Catchment Watershed Minutes Decades Millenia Eons

7 Sample Fractions for Soil Geochemistry Adapting SESAR IGSN for CZO EA-IRMS FTIR SA EA-IRMS FTIR EA-IRMS FTIR Ziplock (~500g) Bulk soil horizon or depth increment Al Can (~70 g) For Gamma Counting 137Cs DRY SIEVE 2 mm glass vial: <2mm fines dry sieved (1) Pick out plant roots & detritus, rinse with DI water, oven dry, mill (SPEX?) >2mm: glass vial: plant detritus milled (2) Remaining pebbles & rocks, hard grind glass vial: pebbles hard ground <2mm ICP-MS after Li-borate fusion XRD? WET SIEVE, or DENSITY, or SETTLING (with or without sonication) glass vial: sand + small detritus glass vial: silt + clay The choice here is important. Do we want aggregates or not? EA-IRMS FTIR ICP-MS after Li-borate fusion XRD CEC SPEX mill EA-IRMS FTIR ICP-MS after Li-borate fusion SPEX mill SA XRD CEC SA Extractions Dithionite-Citrate extraction Na pyrophosphate extraction Ammonium oxalate extraction Christiana River CZO example

8 Overall Approach Do not reinvent the wheel! Build on – CUAHSI HIS, EarthChemDB, LTER, etc Consistent data presentation on web – Metadata – Data values Central data system for data discovery – Harvested by SDSC (pull system)

9 CZO data principles and policies Each CZO will operate and be responsible for its own local data management system for collecting, organizing, quality controlling and publishing data through its web site. – Different philosophy than CUAHSI ODM – Each CZO is master of it’s own data We don’t care what goes on under the hood Each site uses it’s own protocols, data bases, etc Allows CZO to honor site legacy data

10 CZO data principles and policies Each CZO publish’s its data on the web in ascii format with sufficient metadata so that the data can be unambiguously interpreted Metadata follows a proscribed format – Data managers just need rules to follow Easy to harvest by central portal Makes it simple at the site level so scientists comply – Addresses the chokepoint that is getting data/metadata from the scientists to data managers

11 Data Management Team David Tarboton, Utah State. PI on the CUAHSI Hydrologic Information System (HIS) Kerstin Lehnert, Columbia. PI on EarthChemDB Ilya Zaslavsky, Lead, SDSC Spatial Information Systems Lab; hosts CUAHSI HIS. Mark Williams, CU-Boulder. PI Niwot Ridge LTER Anthony Aufdenkampe, co-I Christiana River Basin CZO

12 Integrated CZO data system Synthesizing information management experience and software from CZO partners and neighboring earth science projects into a standards-based system for publishing environmental data to emphasize the “critical zone” nature of our shared data sets

13 Local CZO DB CZO Data Publication System Spatial, hydrologic, geophysical, geochemical, imagery, spectral… Local CZO DB Web site Standard CZO Services Shared vocabularies CZO Metadata Ontology Archive Harvester Standard CZO data display formats CZO Desktop Matlab R Excel ArcGIS Modeling CZO Desktop Applications CZO Data Products CZO Web-based Data Discovery System External cross- project registries DataNet, NEON CZO Data Repository and Indexing (CZO Central)

14 Data Publication Process (for hydrologic time series) CZO Display File ODM WaterML Service OGC WFS Service Raw Display file metadata Is registered with the CZO data portal, to assure original data is discoverable and downloadable. WFS Service Is registered with the CZO data portal CZO Central Catalog OGC CSW Service CZO Portal utilizes the OGC CSW (catalog services for the web) Catalog Search Service CZO Desktop Broader internet community accessing data using standard protocols.

15 CZO data interoperability: what does it mean  Find and download CZO resources: files and file collections, services, documents – organized by CZO thematic category and by type  Data available in compatible semantics: ontologies, controlled vocabularies  Data available via the same service interfaces (e.g. WFS, SOS) but different information models  Compatibility at the level of domain information models and databases Deeper integration Wider variety of data Well-understood data with formal information models available via standard services Different types of data collected by CZOs Data discovery portal Shared vocabularies and ontology management Service administration (CZOCentral) CZO desktop, others System components Levels of interoperability

16

17 Data disclaimer

18 Data Catalogue Biogeochemistry: Including: anything on (Carbon), N (Nitrogen), P (Phosphorus) nutrients, microbes Climatology/Meteorology: Including: Met tower, temps, snow Ecology/Biology: Including: microbial, land use Geology/Chronology: Including: geologic, descriptions of rocks-mineralogy, CRN ages/rates Geomorphology: Including: topography, chronological data, sediment flux, fracture space Geophysics: Including: seismic refraction etc Geospatial: Including: GIS/RS, imagery, geologic map, Gordon Gulch and GLV camera's

19 Water Chemistry Header group (/doc): -Title, Abstract, Investigator, Variable names, Keywords, Methods, Instrument, Citation, Publications, Comments Header group, column information – COL1. Label=ValueAttribue, value=site – COL2. label=ValueAttribute, value=DateTime, UTCOffset=-7, Timezone=MST, format=”YYYYMMDD hh:mm” – COL3. label=ValueAttribute, value=pH, units=pH, SampleMedium=water, units=pH units, missing value indicator=,,methods=method1, etc Header group, column (series) defaults that apply to all columns (eg site below) Data (/data) GREENLAKE4,820311,6.4,18,88.51,0.40,,114.77,24.68,21.75,10.23,25.389,,58.296, 83.200,,,,,,,,,,,,,,,,,, GREENLAKE4,820422,5.7,18,90.15,2.00,,99.80,24.68,17.40,12.79,9.591,,72.870,44. 928,,,,,,,,,,,,,,,,,, Automatically harvested using WaterML and EML ASCII format, metadata and comma-deliminated data

20 CZO Data Management Web Administration Interface CZO data managers use this web-based system to register display files, edit service metadata, initiate data retrieval, validate the data against shared vocabularies, and update hydrologic time series services The administration system will be extended to geochemical samples and other data http://central.criticalzone.org

21 Services edited and validated by CZO data managers Data managers control how their data is annotated. Ingesting of Display files is triggered on the server by the Data manager. Display file ingestion log Editable service definitions and management interface for each CZO data service

22 CZO Central Catalog Statistics, March 24, 2011 (time series services only) CZO ServiceSitesVariablesValues Jemez River141154854 Boulder Creek13111834 Santa Catalina5659222 Luquillo816831098 Southern Sierra841226330 Shale Hills118848624 Christina River3156870150 Total:688110002112

23 New Development: Central CZO Data Discovery Portal Registered data are organized by CZO thematic categories

24 Display files from CZO web sites are registered to the data discovery portal automatically In addition, display files of known types are expressed as data services, which are also registered in the portal The portal is CSW- compliant (CSW=Catalog Services for the Web): can be federated with other catalogs including data.gov Supports search by location, resource type, thematic category, keywords, plus full-text abstract search Federation with CUAHSI HydroCatalog, to allow search of hydrologic data from ~70 networks

25 Local CZO DB Shared Vocabulary Spatial, hydrologic, geophysical, geochemical, imagery, spectral… Local CZO DB Web site Shared Vocabulary Shared vocabularies CZO Metadata Ontology Archive Harvester Standard CZO data display formats CZO Desktop Matlab R Excel ArcGIS Modeling CZO Desktop Applications CZO Data Products CZO Web-based Data Discovery System External cross- project registries DataNet CZO Data Repository and Indexing (CZO Central)

26 CZO Shared Vocabulary System Purpose: To promote the consistent use of terminology. http://sv.critialzone.org Builds on CUAHSI HIS

27 SV Database Data Managers and SV Data Managers ❶ ❷ CSV Data File Unknown Term Email Local CZO Website Observation Database CSV Data File ❸ Request Term Web Page XML SV List

28 Preferred vocabularies. Moderators to be designated by CZO with expertise in each category Variable names (extended from CUAHSI HIS) Units (extended from CUAHSI HIS) (e.g. m, g/L) Value type (from CUAHSI HIS) (e.g. Field observation, derived value, model output) Sample type (from CUAHSI HIS) (e.g. stream water, ground water, rock, soil) Data type (from CUAHSI HIS) (e.g. average over interval, cumulative, continuous, sporadic) Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 = fully infilled and quality controlled) Spatial references ( extensible based on EPSG) (e.g. NAD 1983, WGS84, UTM zone 11) KEY KEY: CZO expands ODM controlled vocabularies to a larger audience using “preferred vocabularies”

29 Methods 1. Major problem for metadata 2. Solution: lookup table that is part of the controlled vocabulary 3. Three parts: sample collection, sample preparation, analytical procedure 4. Up and running, needs moderators

30 Local CZO DB CZO Spatial Data Spatial, hydrologic, geophysical, geochemical, imagery, spectral… Local CZO DB Web site Spatial Data Shared vocabularies CZO Metadata Ontology Archive Harvester Standard CZO data display formats CZO Desktop Matlab R Excel ArcGIS Modeling CZO Desktop Applications Standard CZO Services CZO Web-based Data Discovery System CZO Data Repository and Indexing (CZO Central)

31 Metadata and Spatial View Spatial View Metadata -Multi File control Spatial Extent -Ex: LiDAR flights, transects, etc. -Point data (collected at particular location). -Uses Google Maps API -KML functionality Guo lab, UC Merced

32 CZO Desktop Matlab R Excel ArcGIS Modeling Local CZO DB Geochemical Samples (based on CZEN) Geochemical samples Local CZO DB Web site Geochemical web services, EarthChemDB Shared vocabularies Metadata IGSN management Archive Harvester Standard CZO data display formats CZO Desktop Applications Depth- resolved geochemistry CZO Web-based Geochemical DB EarthChem Data Engine & Portal

33 Location (Watershed) Sampling Site (Soil / Water) AnalysisSample (Layer/Depth) Preparat. /Treatmen t 1 2...... Sub-smpl 2 Sub- sample Sub-smpl n Chemical Phys. Minr Others Data Loc_info /Climate Methods Sources Precision Var-Lookup /Unit Meta-Data Main Data Geo-Info Publication Project SMPL Time Series Landuse /Veg. Lab-Info Person contributor Preparation /Treatment Sample Country /State Lab Analysis Sub- Sample CZO Chemistry Database Conceptual Model – ( CZO CHEM DB ) Penn State lead

34  Progress  Database is accessible at www.czo.psu.eduwww.czo.psu.edu  PSU CZO students and post-docs have used template for data entry  Susan Melzar (Colorado State) has used template and data has been entered into database  Published data from Muhs et al. (2001), Harden 1987, White et al. (2008)  Current version contains 1391 records, representing 17,604 data values  Ran webinar August 24 th to show database capabilities and usage of data entry template  15 participated with representation from all 6 CZO’s  User guide is in progress

35 EarthChem XML DB Metadata catalog datasets (original data & derived products) datasets (original data & derived products) GCDM DB Integration with EarthChemDB 35 USGS NAVDAT GEOROC EarthChem Portal GfG Data Entry User Submission External Databases Topical Data Collections Geochemical Resource Library Kerstin Lehnert

36 EarthChem Portal 36 PetDB Others USGS GEORO C NAVDA T EarthChem Data Engine Database EarthChem Data Engine Database XML EarthChem Data Engine Search & Visualization EarthChem Data Engine Search & Visualization Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas. Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database. Similar to our ODM hydrology portal

37 INTERNATIONAL GEOSAMPLE NUMBER Purpose: Unique identification for samples and related sampling features in the Earth Sciences – To allow unambiguous referencing of data to samples in publications and data systems – To allow tracking samples through repositories & labs – To allow integration of distributed data for samples D3-1

38 Geoinformatics for Geochemistry Core Section 1 Core Section 3 Core Section 2 Sample 1 Sample 2 Sample 1 Sample 2 Sample 3 Sample 1 Sample 2 Sample 3 Rock powder Mineral conc. Leachate Fossil separate Microprobe mount Parent Child Parent IGSN:XXX000120 IGSN:XXX0065B3 IGSN:XXX9K23G6 IGSN:XXX07ST4K IGSN:XYZ0G693M IGSN:ABC0L98SW IGSN:ABC0L53NW IGSN:ABC0L653X IGSN:ABC078HGB

39 IGSN International Organization IGSN International Organization SESAR Near Space Observatory (invented example) Near Space Observatory (invented example) ExoPlanet (invented example) ExoPlanet (invented example) CZO Geoscience Australia USGS IEDA ICDP Repository Analytical Lab Investigator Registrar Registration Agents: Registrants: Managing Agent:

40 ADAPTING IGSN for CZO Register any type of sample: pedons, hand specimens, mineral concentrates, etc. … Register any type of material: soil, rock, sediment, fluid, gas, bio …. Register ‘sample-related features’: sites, wells, cores, dredges … Register relations (parent – children): e.g. site  pedon  mineral

41 Exploring A More General Data Model: ODM 2.0 To achieve interoperability between EarthCHEM, CUAHSI ODM, LTER EML Better support for samples and unique identifiers (IGSN/SESAR) Extensibility to table attributes Better annotation and provenance Enable integrated web service based publication of a broader class of CZO data

42 ODM 2.0 – Field Sensor Extension to support field sensor deployments and in situ observations Sensor deployment details Attributes of sensor Data series from sensor

43 ODM 2.0 – Provenance and Annotations Extensions Better support for storing provenance of observational data

44 General Extensibility Provides capability to record information (add fields) in tables that was not anticipated a-priori

45 CZchemDB CZO-Central GeoChemDB [ODM 2.0] GeoChemDB [ODM 2.0] CZO-Services EarthChem Portal USGS NAVDAT GEOROC Geochemical database EarthChemXML CZO Data Display Format Geochem Services (IEDA) CZO Web Discovery GeoChemDB Search Web-based User Access CZO Desktop GfG Data Validation & Ingest IEDA Long-Term Archiving Service IEDA Data Publication Service (DataCite) SESAR Sample Registration EarthChemXML Other client systems

46 Where we are today Each site has a data manager Data sets are posted to the web – consistent metadata and ascii format in progress We’ve prototyped harvesting data and posting to a central data portal Shared vocabulary system in place Developed protocol for unique sample ID Partnering with EarthChemDB Expanding ODM to become more general Way beyond what I thought possible

47 Work plan for next two years Extending the CZO data publication model to geochemical and GIS data; then to other types of data – towards deeper interoperability Integration based on service and information model standards (WaterML, EarthChemXML, EML, OGC services) – Requirements gathering from all CZOs, data modeling, display file format specification, services specification, development and validation – Upgrade to WaterML 2 once approved as international standard (~Q3, 2011) Registering more hydrologic time series data via CZO Central – Regularly harvesting registered files and updating CZO services; keeping provenance information Enhancing parameter-based search across CZOs, with a shared parameter ontology Making CZO central data system more robust – Currently a single server with 24/7 monitoring; need redundant setup Enhancing role of Data Managers


Download ppt "INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado."

Similar presentations


Ads by Google