Presentation is loading. Please wait.

Presentation is loading. Please wait.

Where Should the GALION Data Reside? Centrally or Distributed? Introduction to the Discussion Fiebig, M.; Fahre Vik, A. Norwegian Institute for Air Research.

Similar presentations


Presentation on theme: "Where Should the GALION Data Reside? Centrally or Distributed? Introduction to the Discussion Fiebig, M.; Fahre Vik, A. Norwegian Institute for Air Research."— Presentation transcript:

1 Where Should the GALION Data Reside? Centrally or Distributed? Introduction to the Discussion Fiebig, M.; Fahre Vik, A. Norwegian Institute for Air Research

2 User Demands for GALION Data Management Data should be easy to find and accessible via one common location. Data should be searchable by location, time window, parameter, … Plotting and browsing tool for online comparison. Data should be downloadable in homogenous format, option for user selection between a few commonly used formats. Data should be of homogenous high quality, including detailed documentation of processing steps for assessing comparability. Different applications require different proximity to raw measurement. Data should include a measure of uncertainty and variability. Data should be available in near-real-time (crisis management, forecast, …) -> one location, one format! Option for aggregating datasets into climatologies. …

3 Current Strategy for Data Management in GALION At least one common point of access for common data pool. Responsibility for QA and long-term availability remains with contributing institutions / networks. Features of common access portal: Holds access metadata from all contributing stations, i.e. dates, times, and type of measurements. Allows search with criteria as network, date, location, … Browsing / quicklook of data. Link to download from original location. Tools for format conversion. Control of access rights.

4 Solution 1: GAWSIS as Data Discovery Portal

5 GAWSIS Features Data directory encompassing all GAW data centres, holds access metadata. Search data availability by country, network, station name, station ID, station type, and parameter. Map visualisation of availability. Station page with station metadata, available datasets list. Link to original repository, direct link to dataset if available. Functionality similar to a Global Information System Centre (GISC) in WMO Information System (WIS) concept. GAWSIS plans include WIS compliance (once that is defined) and plotting tool.

6 Solution 2: EARLINET-ASOS Database and Portal

7 EARLINET-ASOS Database Features Search all EARLINET-ASAS data by date, daytime, season, station, event category, parameter. Select and download data (NetCDF format). Plotting, browsing, comparing function. The EARLINET-ASOS database will be part of the ACTRIS distributed database, which is planned to be WIS compliant (when we know what that means). ACTRIS: EU FP7 project, will network European ground-based in situ & lidar aerosol observations, cloud property observations, and reactive trace gas observations.

8 Solution 3: GEOmon Distributed Database Data discovery portal holding access metadata. Data may be searched by parameter, station, home database, type (in situ, remote sensing, simulation), platform, matrix, geolocation, altitude, temporal availability. Portal links to individual dataset where possible, to database homepage otherwise. Will be developed into entry portal of ACTRIS distributed database.

9 Distributed Data Architecture Pros & Cons Pros: Institutions / networks keep control over data access, data quality, long-term availability and maintain visibility. Know-how on measurement principle and data management is combined for tailored solutions. Cons: All institutions / networks have to maintain server infrastructure (file archive, metadata server, webservice, WIS compliance, …) Well defined formats are essential for smooth interoperability. Implementing on-the-fly conversion of dozens of formats would be resource drain and predefined vulnerability. Near-Real-Time dissemination with uniform QA almost impossible to implement. Long-term availability not ensured.

10 Centralised Data Architecture Pros & Cons Pros: Server infrastructure needs to be maintained only once / few times (economy of scale). Long-term availability ensured. Easy to ensure homogenous data formatting and quality, frequent reformatting not necessary. Almost the only option for implementing NRT service with homogenous automated QA. Cons: Somewhat less visibility of individual institution / network. Institution(s) hosting data centre(s) need to ensure access management. Institution(s) hosting data centre also need experimental expertise.

11 Well-Defined Common Data Formats are Essential for any Data Architecture Data format is more than just selecting NASA-Ames, NetCDF, … Needs to include: implementation profile for format standard and defined vocabulary, i.e. which parameteres / metadata are included in what unit and how are they named, which processing steps were conducted, all self- explaining, flags to indicate special conditions. Example EUSAAR data formats (all NASA-Ames 1001): Level 0: Annotated, instrument specific raw data, ”native” time resolution. Level 1: processed to final physical variable, original time resolution. Level 1.5: automatically aggregated to (hourly) averages, includes uncertainty for averaging period. Level 2: same as level 1.5, but manually quality assured. Well-defined common processing steps between levels establish traceability. Well defined formats don’t limit usability of data, but make routine work more efficient.

12 Efficient Use of Project Resources: GAW aerosol NRT Station: auto-creates hourly data files (level 0). initiates auto-upload to NRT server. Data Centre: check for correct data format (level 0). check whether data stays within specified boun- daries (sanity check). automatic feedback FTP transfer to data centre Hourly level 1 data file Processing to level 1 Hourly level 1.5 data file Processing to level 1.5 EBAS database User access (restricted) via web-interface: ebas.nilu.no User access via machine-to- machine web- service Sub-network data centre: auto-creates hourly data files (level 0). initiates auto-upload to NRT server. FTP transfer to data centre automatic feedback Station: collects raw data in custom format transfer

13 How Do You Access the Data?

14 NRT-Example: Auto-Processed DMPS data


Download ppt "Where Should the GALION Data Reside? Centrally or Distributed? Introduction to the Discussion Fiebig, M.; Fahre Vik, A. Norwegian Institute for Air Research."

Similar presentations


Ads by Google