Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCAR NCAR Data and Grid Efforts: The Earth System Grid & The Community Data Portal Don Middleton NCAR Scientific Computing Division CAS2003 September 11,

Similar presentations


Presentation on theme: "NCAR NCAR Data and Grid Efforts: The Earth System Grid & The Community Data Portal Don Middleton NCAR Scientific Computing Division CAS2003 September 11,"— Presentation transcript:

1 NCAR NCAR Data and Grid Efforts: The Earth System Grid & The Community Data Portal Don Middleton NCAR Scientific Computing Division CAS2003 September 11, 2003

2 NCAR The Earth System Grid U.S. DOE SciDAC funded R&D effort - a “ Collaboratory Pilot Project” U.S. DOE SciDAC funded R&D effort - a “ Collaboratory Pilot Project” Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data l Build upon Globus Toolkit  and DataGrid technologies and deploy l Potential broad application to other areas http://www.earthsystemgrid.org

3 NCAR ESG Team l ANL –Ian Foster (PI) –Veronika Nefedova –(John Bresenhan) –(Bill Allcock) l LBNL –Arie Shoshani –Alex Sim l ORNL –David Bernholdte –Kasidit Chanchio –Line Pouchard l LLNL/PCMDI –Bob Drach –Dean Williams (PI) l USC/ISI –Anne Chervenak –Carl Kesselman –(Laura Perlman) l NCAR –David Brown –Luca Cinquini –Peter Fox –Jose Garcia –Don Middleton (PI) –Gary Strand

4 NCAR

5 Baseline Numbers l T42 CCSM (current, 280km) –7.5GB/yr, 100 years ->.75TB l T85 CCSM (140km) –29GB/yr, 100 years -> 2.9TB l T170 CCSM (70km) –110GB/yr, 100 years -> 11TB

6 NCAR Capacity-related Improvements Increased turnaround, model development, ensemble of runs Increase by a factor of 10, linear data l Current T42 CCSM –7.5GB/yr, 100 years ->.75TB * 10 = 7.5TB

7 NCAR Capability-related Improvements Spatial Resolution: T42 -> T85 -> T170 Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data Increase by factor of ~ 4, linear data CCM3 at T170 (70km)

8 NCAR Capability-related Improvements Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice Increase by another factor of 2-3, data flat Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics), middle Atmosphere Model… Increase by another factor of 10+, linear data

9 NCAR Model Improvement Wishlist Grand Total: Increase compute by a Factor O(1000- 10000)

10 NCAR Longer-term Missions - Observation of Key Earth System Interactions Terra Aura Aqua Landsat 7 Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies GRACE PICASSO Cloudsat QuikScat EO-1 ICEsatJason-1 SRTM VCL We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Triana Courtesy of Tim Killeen, NCAR

11 NCAR ESG Scenario l End 2002: 1.2 million files comprising ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDI l End 2007: As much as 3 PB (3,000 TB) of data (!) l Current practice is already broken – the future will be even worse if something isn’t done…

12 NCAR ESG Scenario (cont.) l Data –Different formats are converted to netCDF –netCDF is not standardized to the CF model –Different sites require knowledge of different methods of access l Metadata –Most kept in online files separate from data and unsearchable unless one is “in the know” –Some kept in people’s brains l Access control –Manual –Not formalized l Data requests –Beginnings of a formal process (e.g., the PCMDI model) –Beginnings of web portals –Far too much done by hand –Logging nearly non-existent

13 NCAR ESG: Challenges l Enabling the simulation and data management team l Enabling the core research community in analyzing and visualizing results l Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

14 NCAR ESG: Strategies l Move data a minimal amount, keep it close to computational point of origin when possible –Data access protocols, distributed analysis l When we must move data, do it fast and with a minimum amount of human intervention –Storage Resource Management, fast networks l Keep track of what we have, particularly what’s on deep storage –Metadata and Replica Catalogs l Harness a federation of sites, web portals –Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

15 NCAR Server Tera/Peta-scale Archive HRM Tools for reliable staging, transport, and replication Server Tera/Peta-scale Archive HRM Client Selection Control Monitoring HRM Storage/Data Management

16 NCAR HRM aka “DataMover” l Running well across DOE/HPSS systems l New component built that abstracts NCAR Mass Storage System l Defining next generation of requirements with climate production group l First “real” usage “The bottom line is that it now works fines and is over 100 times faster than what I was doing before. As important as two orders of magnitude increase in throughput is, more importantly I can see a path that will essentially reduce my own time spent on file transfers to zero in the development of the climate model database” – Mike Wehner, LBNL

17 NCAR OPeNDAP An Open Source Project for a Network Data Access Protocol (originally DODS, the Distributed Oceanographic Data System)

18 NCAR OPeNDAP-g -Transparency -Performance -Security -Authorization -(Processing) Typical Application Data (local) netCDF lib Application Data (remote) OPeNDAP Client Application OPeNDAP Via http Big Data (remote) ESG client Application ESG + DODS OpenDAP Server ESG Server Distributed Application data Distributed Data Access Services OPeNDAP Via Grid

19 NCAR l For XML encoding of metadata (and data) of any generic netCDF file l Objects: netCDF, dimension, variable, attribute l Beta version reference implementation as Java Library (http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm) ESG: NcML Core Schema netCDF nc:netCDFType nc:dimension nc:variable nc: attribute nc:values nc:VariableType

20 NCAR Object [1] id Object [1] id Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= isA Investigation isA Project [0,n] topic type= [0,1] funding Project [0,n] topic type= [0,1] funding isA Ensemble Campaign isPartOf Simulation [0,n] simulationInput type= [0,n] simulationHardware Simulation [0,n] simulationInput type= [0,n] simulationHardware Observation Experiment Analysis isPartOf hasParent hasChild hasSibling Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage isA generated By isPart Of Person [0,1] firstName [0,1] lastName [0,1] contact Person [0,1] firstName [0,1] lastName [0,1] contact Institution [0,1] name [0,1] type [0,1] contact Institution [0,1] name [0,1] type [0,1] contact isA worksFor participant role= Class AbstractClass inheritance association LEGEND Service [0,1] name [0,1] description Service [0,1] name [0,1] description serviceId

21 NCAR ESG Metadata Progress l Co-developed NcML with Unidata –CF conventions in progress, almost done l Developed & evaluated a prototype metadata system l Finalized an initial schema for PCM/CCSM –Address interoperability with federal standards and NASA/GCMD via the generation of DIF/FGDC/ISO –Address interoperability with digital libraries via the creation of Dublin Core l Testing relational and native XML databases, and OGSA-DAI l Exploratory work for first-generation ontology l Authoring of discovery metadata in progress

22 NCAR ESG Web Portal Demonstration

23 NCAR RLS MSS HRM HPSS HRM RLS HPSS HRM RLS DISK HRM RLS DISK cache OGSA-DAI MySQL RDBMS ESG WEB PORTAL Tomcat/Struts cross-update gridFTP query MyProxy authenticate GRAM GATEKEEPER submit execute gridFTP SERVER LAS SERVER visualize LBNL ISI LLNL NCAR ORNL CAS ANL ESG Topology (CAS 2003)

24 NCAR Collaborations & Relationships l CCSM Data Management Group l The Globus Project l Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High- performance DataGrid Toolkit l OPeNDAP/DODS (multi-agency) l NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) l U.K. e-Science and British Atmospheric Data Center l NOAA NOMADS and CEOS-grid l Earth Science Portal group (multi-agency, intnl.)

25 NCAR Immediate Directions l Broaden usage of DataMover and refine l Continue building metadata catalogs l Revisit overall security model and consider simplified approaches l Redesign and implement user interface l Alpha version of OPeNDAPg – Test and evaluate with three client applications (ncview, CDAT, & NCL) l Develop automation for data publishing (GT3) l Deploy for IPCC runs

26 NCAR The Community Data Portal (CDP) l Provide a common portal to NCAR, UCAR, and university data l Provide cyberinfrastructure that dramatically lowers the cost of sharing data (there is large interest in this) l Directly couple to simulation and data analysis systems l Begin capturing rich metadata and catalog our scientific experiments for the world l MSS -> A petascale Mass Knowledge System l Federate internationally (ESG, THREDDS, U.K. e-Science, NOMADS, PRISM, GEON, etc.) “The dataportal has changed my life…” Ben Kirtman, COLA

27 NCAR A Quick Tour of the CDP ‘ing Our Data

28 Community Data Portal Metadata Software THREDDS catalogs ESG metadata DC metadata NcML metadata THREDDS catalog parser application relational DB (MySQL) XML native DB (Xindice XML viewer web application schema- specific stylesheets stores full XML doc shreds XML doc into tables Search & Discovery web application simple query (SQL) Results: list of triplets (dataset id, metadata schema, metadata URL) THREDDS catalogs browser Web application reference other metadata parses future advanced query (Xpath, Xquery) displays links to uses

29 NCAR Data->Knowledge Mass Storage System (1.3PB) Petascale Knowledge Repository Establish new paradigms for managing and accessing scientific data based on semantic organization.

30 NCAR Closing Thoughts l Building an environment for the long-term –Difficult, expensive, and time-consuming –Requires longer-term projects l Team-building is a critical process –Collaboration technologies really help l Managing all the collaborations is a challenge –But extremely valuable l Good progress, first real usage

31 NCAR Managing Expectations…

32 NCAR Links l Earth System Grid –www.earthsystemgrid.org www.earthsystemgrid.org l Community Data Portal –dataportal.ucar.edu

33 NCAR END


Download ppt "NCAR NCAR Data and Grid Efforts: The Earth System Grid & The Community Data Portal Don Middleton NCAR Scientific Computing Division CAS2003 September 11,"

Similar presentations


Ads by Google