1 Large-Scale Data Management Challenges Federating Climate, Water, and Weather Data Repository/Workspace Workshop 20-21 September 2010 Kenneth Galluppi.

Slides:



Advertisements
Similar presentations
1 Large-Scale Data Management Challenges Climate, Water, and Weather Data Kenneth Galluppi Director, Disaster and Environmental Programs Renaissance Computing.
Advertisements

Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Some notes on CyberGIS in hydrology Ilya Zaslavsky Spatial Information Systems Lab San Diego Supercomputer Center UCSD TeraGrid CyberGIS Workshop, February.
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
HydroServer A Platform for Publishing Space- Time Hydrologic Datasets Support EAR CUAHSI HIS Sharing hydrologic data Jeffery.
Sharing Hydrologic Data with the CUAHSI Hydrologic Information System Support EAR CUAHSI HIS Sharing hydrologic data David.
This work is funded by the Inland Northwest Research Alliance INRA Constellation of Experimental Watersheds: Cyberinfrastructure to Support Publication.
ICEWATER: INRA Constellation of Experimental Watersheds Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S. Horsburgh, Utah State.
A Very Brief Introduction to iRODS
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
This work is funded by National Science Foundation Grant EAR Accessing and Sharing Data Using the CUAHSI Hydrologic Information System CUAHSI HIS.
0 Future NWS Activities in Support of Renewable Energy* Dr. David Green NOAA, NWS Office of Climate, Water & Weather Services AMS Summer Community Meeting.
CUAHSI HIS Data Services Project David R. Maidment Director, Center for Research in Water Resources University of Texas at Austin (HIS Project Leader)
Services-Oriented Architecture Updates David Maidment Part of a presentation made to the HIS Standing Committee, Washington DC, Nov 15, 2010.
Components of an Integrated Environmental Observatory Information System Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S.
This work was funded by the U.S. National Science Foundation under grant EAR Any opinions, findings and conclusions or recommendations expressed.
Crossing the Digital Divide
HydroServer A Platform for Publishing Space- Time Hydrologic Datasets Support EAR CUAHSI HIS Sharing hydrologic data Jeffery.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Two NSF Data Services Projects Rick Hooper, President Consortium of Universities for the Advancement of Hydrologic Science, Inc.
Using GIS in Creating an End-to- End System for Publishing Environmental Observations Data Jeffery S. Horsburgh David G. Tarboton, David R. Maidment, Ilya.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
Introducing the CUAHSI Hydrologic Information System Desktop Application (HydroDesktop) and Open Development Community Jiří Kadlec, Daniel Ames, Teva Velupillai.
Deployment and Evaluation of an Observations Data Model Jeffery S Horsburgh David G Tarboton Ilya Zaslavsky David R. Maidment David Valentine
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
An End-to-End System for Publishing Environmental Observations Data Jeffery S. Horsburgh David K. Stevens, David G. Tarboton, Nancy O. Mesner, Amber Spackman.
Over-allocation to irrigation Bushfire recovery impacts Expanding plantations Drying and warming climate Uncapped groundwater extraction Expanding farm.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
Ocean Sciences What is CUAHSI? CUAHSI – Consortium of Universities for the Advancement of Hydrologic Science, Inc Formed in 2001 as a legal entity Program.
IMPROVING THE UPTAKE OF GLOBAL DATA SETS Dr Wolfgang Grabs Chief, Hydrological Forecasting and Water Management Climate and Water Department
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
Crossing the Digital Divide Presented by: Fernando R. Salas David Maidment, Enrico Boldrini, Stefano Nativi, Ben Domenico OGC Technical Meeting – Met/Occean.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Water Web Services David R. Maidment Center for Research in Water Resources University of Texas at Austin Open Waters Symposium Delft, the Netherlands.
Data Interoperability in the Hydrologic Sciences The CUAHSI Hydrologic Information System David Tarboton, David Maidment, Ilya Zaslavsky, Dan Ames, Jon.
Advancing an Information Model for Environmental Observations Jeffery S. Horsburgh Anthony Aufdenkampe, Richard P. Hooper, Kerstin Lehnert, Kim Schreuders,
Publishing Observations Data: from ODM to HIS Central.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
CUAHSI Hydrologic Information Systems. HIS Project Team Yao Liang John Helly Project co-PI Collaborator.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Water and Catchment Data Services David R. Maidment Center for Research in Water Resources University of Texas at Austin River Science Symposium Swansea,
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
The CUAHSI Hydrologic Information System Presented by Dr. Tim Whiteaker The University of Texas at Austin 22 February, 2011.
The CUAHSI Community Hydrologic Information System Jeffery S. Horsburgh Utah Water Research Laboratory Utah State University CUAHSI HIS Sharing hydrologic.
Bringing Water Data Together David R. Maidment Center for Research in Water Resources University of Texas at Austin Texas Water Summit San Antonio Tx,
CUAHSI HIS Features of Observations Data Model. NWIS ArcGIS Excel NCAR Trends NAWQA Storet NCDC Ameriflux Matlab AccessSAS Fortran Visual Basic C/C++
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Sharing SRP Water Sample Data Using CUAHSI HIS Infrastructure Ilya Zaslavsky, Thomas Whitenack, Keith Pezzoli, Hiram Sarabia University of California at.
A Proposed Short Course on Data Stewardship Scott Hausman Deputy Director NOAA’s National Climatic Data Center Preparing Scientists to Steward Their Data.
The CUAHSI Observations Data Model Jeff Horsburgh David Maidment, David Tarboton, Ilya Zaslavsky, Michael Piasecki, Jon Goodall, David Valentine,
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
From Prototype to Service: A CUAHSI Datacenter for Hydroinformatics Richard Hooper Consortium of Universities for the Advancement of Hydrologic Science,
NOAA Vision and Mission Goals Pedro J. Restrepo, Ph.D., P.E. Senior Scientist, Office of Hydrologic Development NOAA/NWS First Q2 Workshop (Q2 - "Next.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Hydroinformatics Lecture 15: HydroServer and HydroServer Lite The CUAHSI HIS is Supported by NSF Grant# EAR CUAHSI HIS Sharing hydrologic data.
Developing a community hydrologic information system David G Tarboton David R. Maidment (PI) Ilya Zaslavsky Michael Piasecki Jon Goodall
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Hydroinformatics Lecture: HydroServer .NET/PHP
Using GIS in Creating an End-to-End System for Publishing Environmental Observations Data Jeffery S. Horsburgh David G. Tarboton, David R. Maidment, Ilya.
Sharing Hydrologic Data with the CUAHSI* Hydrologic Information System
The CUAHSI Hydrologic Information System and NHD Plus A Services Oriented Architecture for Water Resources Data David G Tarboton David R. Maidment (PI)
Lecture 8 Database Implementation
CUAHSI HIS Sharing hydrologic data
Hydroinformatics Lecture 15: HydroServer (and HydroServer Lite)
Data Acquisition, Management and Manipulation
HydroDesktop: A Key Component of the CUAHSI/CZO HIS for Hydrologic Data Discovery, Visualization, and Analysis Daniel P. Ames, Ph.D. P.E. Idaho State University.
Bird of Feather Session
Presentation transcript:

1 Large-Scale Data Management Challenges Federating Climate, Water, and Weather Data Repository/Workspace Workshop September 2010 Kenneth Galluppi Director, Disaster and Environmental Programs Renaissance Computing Institute University of North Carolina at Chapel Hill

Outline Environmental Problem Use Case – Climate and Weather – Hydrology Data Grid/Workspace Use Cases Answer Peter’s Questions

3 Environmental Science Problems Enable cutting edge, Grand Challenge multidisciplinary science through the federation of data-grids of climate, water, and weather data, with other geospatially and socially relevant datasets. – Understanding of regional impacts of climate change on water availability and society trends – Understanding and prediction of catastrophic weather- driven events under climate change – Communicate risk/crisis knowledge non-specialists

4 Challenges of Data Integration of Large, Multidisciplinary Datasets – NCDC and NOAA Centers, SDSC, and others – Discover, access, integration, utility [not store/retrieve] Linkage of Datasets to Computational Models – Input/outputs for real-time model forecasting – Model-to-observation comparison – Climatic models for reanalysis and prediction Access to Large Reference Data – Climate Reanalysis Datasets, 1 PetaByte – NWS DataCube for aviation and emergencies

5 Collaboration and Datagrids National Climatic Data Center Emergency Management Research Program Federal Agencies Academic Research 140 universites

NOAA Mission: Weather & Water Serve Society’s Needs for Weather and Water Information Ecosystems Protect, Restore, and Manage the Use of Coastal and Ocean Resources through an Ecosystem Approach to Management Climate Understand Climate Variability and Change to Enhance Society’s Ability to Plan and Respond To understand and predict changes in Earth’s environment and conserve and manage coastal and marine resources to meet our nation’s economic, social, and environmental needs National Climatic Data Center Commerce & Transportation Support the Nation’s Commerce with Information for Safe, Efficient, and Environmentally Sound Transportation Mission Support Provide Critical Support for NOAA’s Mission NOAA Goals: Data Supports NOAA/NCDC Mission

The National Environmental Data Archive Climate Analysis RADAR Satellite Other

Comprehensive Large Array-data Stewardship System (CLASS) Storage (reanalysis) The National Environmental Data Archive

NOAA CLASS Large Structured data Propriety Doesn’t interface with HPSS Climate Support of products and services Does well, what it does

NOAA’s Data Centers Will Function in a Wider Information Landscape NCDC NGDC NODC NSOF

NOAA’s Data Centers Will Function in a Wider Information Landscape

ORNL, ESG NSF DataNet DAPs Data Mgmt IPCC International Sources NEAAT

Climate Services using Federated DB’s  NOAA’s Data Centers will need to provide access to petabytes of data that are distributed across multiple NOAA facilities  Be able to integrate these data with data from other disciplines (environmental, biological, social, etc..) that are distributed on other databases both in the public and private sector domain  Export data to common data formats - Shapefile, Well-Known Text, Arc/Info ASCII GRID, Gridded and Raw NetCDF, GeoTIFF and KMZ (Google Earth) Support : Disaster reduction Human Health Climate Water Resources Weather Ocean Resources Agriculture & Land-Use Ecosystems

NOAA/NCDC Climate Services

Data supports NOAA/NCDC Mission NCDC will need to function in a wider information landscape with a NOAA Federated Archive (6 data centers) – Support distributed data management and services Interoperable with DataNet, Earth System Grid, GEO-IDE, EOSDIS, etc. – netCDF, LDM, CF conventions, ISO Move out of the Box and into the Cloud (networked) – Utilize highly distributed storage and computing (RENCI, Oak Ridge National Lab Implement supporting technologies to enable interoperability with Designated Communities (OGC, WMS/WFS) Institute rules-based data management to enable true federation of NOAA Centers of Data – iRODS

16 NCDC-RENCI Potential Use Cases Catastrophic Event Modeling and Observations Climate Reanalysis Datasets – Climate records everywhere, for 30 years – 1-PetaByte – Regional and local sub-setting – Ten’s of thousands of users Multi-sensed Gridded Precipitation Climatology Extreme Event Climatology Green Energy, physical-social science Integration

17 High Level View of HIS Service Oriented Architecture As of October 2009, 1,867,108 sites and 4,336,790,286 data values where available through the HIS from federal, state, and academic data providers. There have been 543,144 “GetValues” data requests from Feb 2008 to Oct Hydrology Community

CUAHSI HIS The CUAHSI Hydrologic Information System (HIS) is an internet based system to support the sharing of hydrologic data. It is comprised of hydrologic databases and servers connected through web services as well as software for data publication, discovery and access. Data Discovery and Integration platform Data Publication platform Data Synthesis and Research platform Data Services Metadata Services Metadata Search HIS Central HydroDesktopHydroServer Service registration Catalog harvesting Service and data theme metadata Data carts Water Data Services Spatial Data Services Like search portals Google, Yahoo, Bing Like browsersLike web servers Like HTML

19 HIS Service Oriented Architecture

Publication of Point Observations Observations Data Model (ODM) – ODM Tools – ODM Data Loader – ODM Streaming Data Loader – ODM Controlled Vocabularies WaterOneFlow web services – Data are transmitted in WaterML format

Dynamic Controlled Vocabulary Moderation System Local ODM Database Master ODM Controlled Vocabulary HIS CV Website ODM Controlled Vocabulary Moderator ODM Data Manager ODM Controlled Vocabulary Web Services ODM Tools Local Server XML

Horsburgh, J. S., D. G. Tarboton, D. R. Maidment and I. Zaslavsky, (2008), A Relational Model for Environmental and Water Resources Data, Water Resour. Res., 44: W05406, doi: /2007WR CUAHSI Observations Data Model

Maximize Data Access and Utility

24 Data and Model Integration Needed to Support Hydrologic Science Observations Hydrologic Models Weather and Climate Models Physical Data Socioeconomic Data CUAHSI HIS DFC

ODM WaterOneFlow HydroServer Capabilities Database ODM Databases and Web Services ArcGIS Server Spatial Data Services Capabilities Database Configuration Tool Spatial Services WaterOneFlow Services

ODM WOF National Dataset Water Data Services NWIS WOF ODM WOF ODM WOF ODM WOF HydroServer Distributed Water Data Services STORET WOF DAYMET WOF Metadata Catalog Ontology HydroDesktopHIS Central Ontology Services Metadata Services HydroDesktop Plug-ins Desktop Data Repository Search, Download, and Manage Data Subscriptions Visualize and Summarize (TSA) Convert Units Convert Formats and Export Import Files Data Discover y Calls Web Service Calls WaterML SNOTEL WOF … Metadata Harvesting Variable Mapping ODM WOF R MATLAB Excel Desktop Analysis Software Workflow Database API

11 WATERS Network test bed projects 16 ODM instances (some test beds have more than one ODM instance) Data from 1246 sites, of these, 167 sites are operated by WATERS investigators National Hydrologic Information Server San Diego Supercomputer Center HydroServer Implementation in WATERS Network Information System

RHESSys TOPS ADAS Meteorology, Hydrology, Ecological Models WRF RHESSYS HEC-RAS ADCIRC Scientific Research Historical Re-Analysis Disaster Planning Disaster Response Agricultural Forecasts Ag Decision Support Public Dissemination Economic Planning etc … Sensor Data Bus TOPS State Climate Office Sensor Cloud National Weather Service Department of Transportation / FAA USGS NWIS, USFS Buoys, Stream Gauges, Soil Moisture People with mobile devices etc … CHPS Enablement

29 Use Case: National Water Model Terrain in the Neuse River Basin, NC constructed from 390 million LiDAR measurements Flooding in the Mississippi River Basin, August 1993 observed from satellite imagery Hydrologic scientist have expressed a “grand research challenge” of building a National Water Model for flood and drought applications. Achieving this goal will require a system like DFC to handle the massive data requirements. Source: nasa.gov Source: terrain.cs.duke.edu

30 CUAHSI Case Study Hydrology Grand Challenge Problem: National Water Model – How much water is available in the Nation’s water resources? – Currently, hydrologic models are implemented at the watershed-scale (county) – Hydrologists plan to scale physically-based models to national level Provide CI, Policies & Sustainability for Water Model Data – Gathering, analysis, dissemination and preservation – Policies for quality control, metadata harvesting, versioning and usage – Enables the data required for real-time analysis for flood and drought modeling – Enables integrating data from “new sources” – Enables new science, outreach, decision making and disaster recovery – Integration of Predictive Models, Real-time Data and Historic Data

Technical Solutions – Too many systems/solutions, home grown to programs (CUAHSI) – Standards (ODM, OGC, Virtual USA, etc.) – Federal enterprises – NOAA, CLASS general, heavy system – Oracle front end to large tape system Unique Handling large sets with limited skills Multidisciplinary, formats are not enough, but knowledge Federal – Has to work, has to preserve – Observation systems are getting more complex – Users are more sophisticated and demanding more

Data Management Large Storage Systems Compute and Servers Firewall Security HPCC Compute iRODS Workflow Data Manage DataNet Data Management, Data Grid Testbed

Diversity in the Landscape Data grids to include generic data management infrastructure – Data sharing – Digital libraries, publish and discovery – Persistent archives for preservation – Data processing pipelines – Virtualize data collections File systems Tape archives Cloud storage Institutional repositories Digital repositories

Diversity in the Landscape Policy-based Data Management – Each center has same management needs but implement different policies and procedures – Implement their own policies but leverage standard data management – Interoperate with other repositories through specific drivers that implement protocol Integrated Rule Oriented Data System (iRODS)

How to Federate? Users, services and local storage Clients – present information in context – User level file systems – Web browsers – Web services Workflow – manage processing steps Data grid – access to the repositories – Uniform name space – Properties (meta) and access (time stamp, version) – Policies – retention, disposition, authenticity, QA Storage Systems – tapes, file system, cloud

Safe Replication Repositories must be replicated Data grids are good at this – Making copies – Keeping track of copies – Integrity of copies – Disposition of copies (rules for retention and checking)

Policy Rules for Control Actions that simplify use of data – Data sharing: access control, distribution, organizing – Publishing: Descriptive metadata, integrity, replication – Data preservation: retention, disposition, trust, ownership Data ingestion, storage, and access control

User Workspaces Needed for interim data products Track operations performed on the data – Same needs as repositories, only shorter timeframe – Individual, organization, operation processing

Processing and workspaces Process of petabytes collections and distributed processing Process at local storage if simple processing Move file is processing is complex or demanding. Data management views processing transparently and facilitates: – Move files – Manage processing and workspace

Frameworks for distributed processing iRODS – integrated Rule Oriented Data System – Internal workflows (rules of microservices) – External workflows (Taverna, Kepler, Pegasus) – Data management decoupled from workflows and both can be distributed Data interchange with workflow – Parameter passing (microservice) – In-memory structures (workflow and microservice) – In-memory, but distrubuted – Shared metadata, retrieved out of catalog – Shared files