Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton.

Similar presentations


Presentation on theme: "1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton."— Presentation transcript:

1 1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton

2 2  Data Portal was launched in 1995 as simple ftp server.  The idea and the term “Data Portal” arose 3 years ago.  Originally it served data by occasional requests.  Now the main assets are IPCC data. History NOAATECH-2006

3 3 Common technical characteristics Software  Red Hat Linux  Apache Web Server  DODS Aggregation Server  THREDDS  LAS Server  GrADS-DODS NOAATECH-2006

4 4 Hardware  Dell Power Edge 2650 machine  Dual Processor Intel Xeon 2.4 GHz  3 GB RAM  7 Dell Power Vault 220S with 14 HDs in each, 19 TB total (expansion pending up to 35 TB) 14 HDs in each, 19 TB total (expansion pending up to 35 TB)  Network bandwidth: internet – 9 Mbit/s internet-2 – 100 Mbit/s NOAATECH-2006

5 5 WEB Site Structure NOAATECH-2006

6 6 Basic Metadata  Model description  Experiment description  Institution  Extra metadata for treating tripolar grids (including ferret scripts for their visualization) visualization)  Metadata is compliant with standard CF  Metadata accompanies each data file NOAATECH-2006

7 7  Dynamic data presentation chosen by user  Spatial/time subsampling with included metadata  Defining on a fly new variables calculated by given formula  ferret visualization NOAATECH-2006 Basic features GFDL LAS server Basic features GFDL LAS server

8 8 General Statistics 01-Oct-2004 to 01-Oct-2005  Total amount of CM2 Climate Model Data: 12 TB  More then 10000 NetCDF files, average file size: 1 GB  Successful requests: ~62,000  Average successful requests per day: ~200  Distinct files requested: 5,000  Distinct hosts served: ~850  Data transferred: 15 TB  Average data transferred per day: ~42 GB  Number of journal articles submitted that include analyses of GFDL CM2 model output: > 100 NOAATECH-2006

9 9 Current standard procedure of publishing data  Climate Model Output Rewriter (CMOR) processing  manual configuring for different models, experiments, variables  triggered manually  Quality Control  made by scientist, includes checking metadata, time ranges, values diapasons, etc.  Splitting up CMORized, QC-ed data into small (<2GB) NCDF files and pushing them out of firewall to Data Portal  manual configuring scripts doing this  starting scripts manually  Preparing checksum report on Data Portal  running cron started script  Configuring Aggregation Server and LAS  made manually NOAATECH-2006

10 10 Current Data Portal workflow NOAATECH-2006

11 11 Desirable Features of Data Portal  Relational Database storing metadata with description of  model components and model configuration  scenarios  postprocessing (model output and CMOR)  experiments  variables  formulized rules of Quality Control  data locations in Archive  task scheduler  users and groups accounts  XML as data exchange format  for compliance with FMS Runtime Environment (FRE)  working format of existing third party software  good fitted for hierarchical metadata description  prevalent in world, easy to exchange with others Data Portals  Publisher Control Center (PCC)  controls CMOR subsystem  controls Data Publisher Manager  controls data quality (QAC) NOAATECH-2006

12 12 Desirable Features of Data Portal (continue)  Climate Model Output Rewriter (CMOR) subsystem  prepares data consistently with specific project requirements  Data Publisher Manager  transfers data to target destination in accordance to settings from DB  Front-end Data Portal Software Package  Configuration Manager (configures Aggregation Server and Data Portal Interface)  Search Catalog Engine  Data Subsampling Engine  Data Computation Engine  Data Visualization  Data Delivery Manager NOAATECH-2006

13 13 Proposed functionality schema of ‘GFDL Data Factory’ NOAATECH-2006

14 14 Standard scenario of functioning Model Data Factory (ideal picture)  Scientist builds model in existing GFDL FMS Runtime Environment System (FRE) using available model components, datasets and forcing scenario.  FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment;  Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB.  Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR.  CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions.  DP calls QAC and then transfers data to Data Portal storage.  Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB.  End of process, data is ready to go. NOAATECH-2006

15 15 Database Compartments:  Model Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configuration contains models’ descriptions, allows to build coupled model of needed configuration  Variables Compartment List of all related physical variables List of all related physical variables  Workflow Compartment contains scenarios, experiments, institutions, projects and users info contains scenarios, experiments, institutions, projects and users info  Postprocessing Compartment defines postprocessing plan for conducting experiment defines postprocessing plan for conducting experiment  Data Portal Compartment contains info about experiment data contains info about experiment data Database ‘curator’ design Database ‘curator ’ design NOAATECH-2006

16 16 Interaction between compartments NOAATECH-2006

17 17 MySQL DB CURATOR NOAATECH-2006

18 18 Model Metadata Compartment (in development) Coupled_Models Model_List Component_Medias Models Experiments Workflow Compartment Variables Variables Compartment NOAATECH-2006

19 19 Data Samples from Model Compartment Components_Medias Coupled_Models Model_List Models NOAATECH-2006

20 20 Variables Compartment Projects Workflow Compartment Variables Variable_Bundles Variable_Lists Variable_List_Contents Proj_Var_Names NOAATECH-2006

21 21 Variable_Lists Variable_List_Contents Data Sample from Variables Compartment Proj_Var_Names Variables Variable_Bundles NOAATECH-2006

22 22 Workflow Compartment InstitutionsGFDL_USERS Experiment_Status Realization Projects Experiments Scenarios NOAATECH-2006

23 23 Data Samples from Workflow Compartment Experiments Scenarios NOAATECH-2006

24 24 Coupled_Models Postprocessing Compartment PP_Units Post_Proc PP_Content Data Samples from Postprocessing Compartment PP_Units PP_Content Variable_Lists Projects GFDL_USERS Average_Periods NOAATECH-2006

25 25 Data Portal Compartment MissedData_Descriptors Data_GridsData_Files Variables Experiments Variable_Bundles Coupled_Models NOAATECH-2006

26 26 Data Samples from Data Portal Compartments Data_Files Data_Grids MissedData_Descriptors NOAATECH-2006

27 27 Curator DB on Data Portal stream  Curator DB is already used on GFDL Data Portal.  JSP technology with servlets on backend was applied  New data transferred onto Data Portal is automatically registered in Curator DB with all accompanied metadata.  It turned out the fastest way to search for data on Data Portal: CM2.0 CM2.0CM2.0 CM2.1 CM2.1CM2.1 NOAATECH-2006

28 28 Another Aspects of Future Development  Set up model metadata schema standards in scientific community and develop SQL metadata schema.  Populate Curator with real metadata extracted from GFDL models.  Conjugate Curator DB with GFDL FMS Modeling System  Customize LAS server to use the Curator DB  Design user interfaces NOAATECH-2006

29 29 END ENDQuestions?Thanks! NOAATECH-2006


Download ppt "1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton."

Similar presentations


Ads by Google