Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCAR The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCARs Data & GriD Efforts) for COMMISSION FOR BASIC SYSTEMS INFORMATION SYSTEMS and.

Similar presentations


Presentation on theme: "NCAR The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCARs Data & GriD Efforts) for COMMISSION FOR BASIC SYSTEMS INFORMATION SYSTEMS and."— Presentation transcript:

1 NCAR The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCARs Data & GriD Efforts) for COMMISSION FOR BASIC SYSTEMS INFORMATION SYSTEMS and SERVICES INTERPROGRAMME TASK TEAM ON THE FUTURE WMO INFORMATION SYSTEM INTERPROGRAMME TASK TEAM ON THE FUTURE WMO INFORMATION SYSTEM KUALA LUMPUR, OCTOBER 2003 Courtesy: Don Middleton NCAR Scientific Computing Division

2 NCAR Atkins Report l A new age has dawned… The Panels overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources necessary to empower a revolution. The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research. Atkins Report, Executive Summary

3 NCAR The Earth System Grid U.S. DOE SciDAC funded R&D effort - a Collaboratory Pilot Project U.S. DOE SciDAC funded R&D effort - a Collaboratory Pilot Project Build an Earth System Grid that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data Build an Earth System Grid that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data l Build upon Globus Toolkit and DataGrid technologies and deploy (Rubber on the road) l Potential broad application to other areas

4 NCAR ESG Team l ANL –Ian Foster (PI) –Veronika Nefedova –(John Bresenhan) –(Bill Allcock) l LBNL –Arie Shoshani –Alex Sim l ORNL –David Bernholdte –Kasidit Chanchio –Line Pouchard l LLNL/PCMDI –Bob Drach –Dean Williams (PI) l USC/ISI –Anne Chervenak –Carl Kesselman –(Laura Perlman) l NCAR –David Brown –Luca Cinquini –Peter Fox –Jose Garcia –Don Middleton (PI) –Gary Strand

5 NCAR

6 Baseline Numbers l T42 CCSM (current, 280km) –7.5GB/yr, 100 years ->.75TB l T85 CCSM (140km) –29GB/yr, 100 years -> 2.9TB l T170 CCSM (70km) –110GB/yr, 100 years -> 11TB

7 NCAR Capacity-related Improvements Increased turnaround, model development, ensemble of runs Increase by a factor of 10, linear data l Current T42 CCSM –7.5GB/yr, 100 years ->.75TB * 10 = 7.5TB

8 NCAR Capability-related Improvements Spatial Resolution: T42 -> T85 -> T170 Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data Increase by factor of ~ 4, linear data CCM3 at T170 (70km)

9 NCAR Capability-related Improvements Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice Increase by another factor of 2-3, data flat Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics), middle Atmosphere Model… Increase by another factor of 10+, linear data

10 NCAR Model Improvement Wishlist Grand Total: Increase compute by a Factor O( )

11 NCAR ESG Scenario l End 2002: 1.2 million files comprising ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDI l End 2007: As much as 3 PB (3,000 TB) of data (!) l Current practice is already broken – the future will be even worse if something isnt done…

12 NCAR ESG Scenario (cont.) l Data –Different formats are converted to netCDF –netCDF is not standardized to the CF model –Different sites require knowledge of different methods of access l Metadata –Most kept in online files separate from data and unsearchable unless one is in the know –Some kept in peoples brains l Access control –Manual –Not formalized l Data requests –Beginnings of a formal process (e.g., the PCMDI model) –Beginnings of web portals –Far too much done by hand –Logging nearly non-existent

13 NCAR ESG: Challenges l Enabling the simulation and data management team l Enabling the core research community in analyzing and visualizing results l Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

14 NCAR ESG: Strategies l Move data a minimal amount, keep it close to computational point of origin when possible –Data access protocols, distributed analysis l When we must move data, do it fast and with a minimum amount of human intervention –Storage Resource Management, fast networks l Keep track of what we have, particularly whats on deep storage –Metadata and Replica Catalogs l Harness a federation of sites, web portals –Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

15 NCAR Server Tera/Peta-scale Archive HRM Tools for reliable staging, transport, and replication Server Tera/Peta-scale Archive HRM Client Selection Control Monitoring HRM Storage/Data Management

16 NCAR HRM aka DataMover l Running well across DOE/HPSS systems l New component built that abstracts NCAR Mass Storage System l Defining next generation of requirements with climate production group l First real usage The bottom line is that it now works fine and is over 100 times faster than what I was doing before. As important as two orders of magnitude increase in throughput is, more importantly I can see a path that will essentially reduce my own time spent on file transfers to zero in the development of the climate model database – Mike Wehner, LBNL

17 NCAR OPeNDAP An Open Source Project for a Network Data Access Protocol (originally DODS, the Distributed Oceanographic Data System)

18 NCAR OPeNDAP-g -Transparency -Performance -Security -Authorization -(Processing) Typical Application Data (local) netCDF lib Application Data (remote) OPeNDAP Client Application OPeNDAP Via http Big Data (Multiple remotes) ESG client Application ESG + DODS OpenDAP Server ESG Server Distributed Application data Distributed Data Access Services OPeNDAP Via Grid

19 NCAR l For XML encoding of metadata (and data) of any generic netCDF file l Objects: netCDF, dimension, variable, attribute l Beta version reference implementation as Java Library (http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm) ESG: NcML Core Schema netCDF nc:netCDFType nc:dimension nc:variable nc: attribute nc:values nc:VariableType

20 NCAR Object [1] id Object [1] id Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= isA Investigation isA Project [0,n] topic type= [0,1] funding Project [0,n] topic type= [0,1] funding isA Ensemble Campaign isPartOf Simulation [0,n] simulationInput type= [0,n] simulationHardware Simulation [0,n] simulationInput type= [0,n] simulationHardware Observation Experiment Analysis isPartOf hasParent hasChild hasSibling Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage isA generated By isPart Of Person [0,1] firstName [0,1] lastName [0,1] contact Person [0,1] firstName [0,1] lastName [0,1] contact Institution [0,1] name [0,1] type [0,1] contact Institution [0,1] name [0,1] type [0,1] contact isA worksFor participant role= Class AbstractClass inheritance association LEGEND Service [0,1] name [0,1] description Service [0,1] name [0,1] description serviceId

21 NCAR ESG Metadata Progress l Co-developed NcML with Unidata –CF conventions in progress, almost done l Developed & evaluated a prototype metadata system l Finalized an initial schema for PCM/CCSM –Address interoperability with federal standards and NASA/GCMD via the generation of DIF/FGDC/ISO –Address interoperability with digital libraries via the creation of Dublin Core l Testing relational and native XML databases, and OGSA-DAI l Exploratory work for first-generation ontology l Authoring of discovery metadata in progress

22 NCAR RLS MSS HRM HPSS HRM RLS HPSS HRM RLS DISK HRM RLS DISK cache OGSA-DAI MySQL RDBMS ESG WEB PORTAL Tomcat/Struts cross-update gridFTP query MyProxy authenticate GRAM GATEKEEPER submit execute gridFTP SERVER LAS SERVER visualize LBNL ISI LLNL NCAR ORNL CAS ANL ESG Topology

23 NCAR Collaborations & Relationships l CCSM Data Management Group l The Globus Project l Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High- performance DataGrid Toolkit l OPeNDAP/DODS (multi-agency) l NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) l U.K. e-Science and British Atmospheric Data Center l NOAA NOMADS and CEOS-grid l Earth Science Portal group (multi-agency, intnl.)

24 NCAR Immediate Directions l Broaden usage of DataMover and refine l Continue building metadata catalogs l Revisit overall security model and consider simplified approaches l Redesign and implement user interface l Alpha version of OPeNDAPg – Test and evaluate with client applications l Develop automation for data publishing (GT3) l Deploy for IPCC runs

25 NCAR The Community Data Portal (CDP) l Provide a common portal to NCAR, UCAR, and university data l Provide a sustainable cyberinfrastructure that dramatically lowers the cost of sharing data (there is HUGE interest in this) l Directly couple to simulation systems and DataMonster l Begin capturing rich metadata and catalog our scientific experiments for the world l MSS -> A Petascale Mass Knowledge System l Federate internationally (ESG, THREDDS, U.K. e-Science, NOMADS, PRISM, GEON, etc.) The dataportal has changed my life… Ben Kirtman, COLA

26 NCAR Foster Revolutionary Change Mass Storage System (1.5PB) Petascale Knowledge Repository Establish a new paradigm for managing and accessing scientific data based on semantic organization.

27 NCAR Community Data Portal l Purpose: Build an infrastructure using different methods for data exploration and delivery Build an infrastructure using different methods for data exploration and delivery Web-based retrieval and interactive analysis for MSS collections Web-based retrieval and interactive analysis for MSS collections Data sharing for multi-institution cooperative studies Data sharing for multi-institution cooperative studies Browse, select, compare, download data sets, & specify data subsets using – graphical, text entry, choice of output format Browse, select, compare, download data sets, & specify data subsets using – graphical, text entry, choice of output format l Components: User interface, Live Access Server (LAS) User interface, Live Access Server (LAS) Middleware, Ferret, NCL, GrADS Middleware, Ferret, NCL, GrADS File service, local, or DODS File service, local, or DODS l Status: Pilot working (2 years), more middleware testing Pilot working (2 years), more middleware testing

28 NCAR Data Access Data Collections MassiveData Simulation & Retrospective FerretNCLOther Engines Live Access Client DODS CSM, PCM, DSS, MM5, WRF, MICOM, CMIWG Live Access Server

29 NCAR Example … Data Analysis

30 NCAR Live Access Server + NCL (Grib Data)

31 NCAR Interface and Reanalysis 2 Sea Level Pressure

32 NCAR dataportal.ucar.edu raid disks MSS catalogs parsing & metadata ingestion data search & discovery catalogs browsing MSS data retrieval Struts Tomcat UI data access (OPeNDAP, FTP, HTTP) data visualization (NCL, Ferret) GDSDODS aggregation serverLAS Tomcat UI hardware core services middleware user interface Community Data Portal architecture

33 NCAR Community Data Portal Metadata Software THREDDS catalogs ESG metadata DC metadata NcML metadata THREDDS catalog parser application relational DB (MySQL) XML native DB (Xindice XML viewer web application schema- specific stylesheets stores full XML doc shreds XML doc into tables Search & Discovery web application simple query (SQL) Results: list of triplets (dataset id, metadata schema, metadata URL) THREDDS catalogs browser Web application reference other metadata parses future advanced query (Xpath, Xquery) displays links to uses

34 NCAR CDP Data/Catalog Contributors ACD: MOZART v2.1 standard run (Louisa Emmons) ATD: Radar almost ready for today! CGD: CAS satellite data example (Lesley Smith) CGD: CDAS and VEMAP data (Steve AulenBach, Nan Rosenbloom, Dave Schimmel) CGD: CCSM 1000 year run (Lawrence Buja) CGD: PCM 16 top datasets (Gary Strand) SCD: DSS full data holdings (Bob Dattore, Steve Worley) SCD: VETS example visualization catalog (Markus Stobbs, Luca Cinquini) COLA: Jennifer Adams, Jim Kinter, Brian Doty

35 NCAR Next Steps l Recruiting (!) –One student for data ingest –One software engineer –Systems –Expanding storage by 20TB (SCD cosponsor) l Ongoing publication of datasets l Publishing documents on plans, design, how to partner, standard services, and management procedures l Building partnerships, DMWG meeting August

36 NCAR Closing Thoughts l Building a sustainable infrastructure for the long-term Difficult, expensive, and time-consuming Difficult, expensive, and time-consuming Requires longer-term projects Requires longer-term projects l Team-building is a critical process Collaboration technologies really help Collaboration technologies really help l Managing all the collaborations is a challenge But extremely valuable But extremely valuable l Good progress, first real usage

37 NCAR Links l Earth System Grid –www.earthsystemgrid.org l Community Data Portal –dataportal.ucar.edu

38 NCAR END

39 Longer-term Missions - Observation of Key Earth System Interactions Terra Aura Aqua Landsat 7 Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies GRACE PICASSO Cloudsat QuikScat EO-1 ICEsatJason-1 SRTM VCL We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Triana Courtesy of Tim Killeen, NCAR

40 NCAR Characteristics of Infrastructure l Essential –So important that it becomes ubiquitous l Reliable –Example: the built environment of the Roman Empire l Expensive –Nothing succeeds like excess (e.g. Interstate system –Inherently one-off (often, few economies of scale) l Clear factorization between research and practice –Generally deploy what provably works

41 NCAR CDP Interactions & Opportunities l COLA l CGD/VEMAP l ACD,HAO/WACCM l CGD/CCSM, CAM l CGD/CAS l MMM/WRF l UCAR/JOSS l UCAR/Unidata l CGD,SCD,CU/GridBGC l NOAA/NOMADS l GODAE l HAO/TIEGCM,MLSO l ATD/Radar, HIAPER l ACD/Mozart, BVOC, Aqua proposal l BioGeo/CDAS l SCD/DSS l DOE/Earth System Grid l DLESE l GIS Initiative


Download ppt "NCAR The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCARs Data & GriD Efforts) for COMMISSION FOR BASIC SYSTEMS INFORMATION SYSTEMS and."

Similar presentations


Ads by Google