Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,

Similar presentations


Presentation on theme: "SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,"— Presentation transcript:

1 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky, Reza Wahadj, David Valentine, Blair Jennings (San Diego Supercomputer Center, UCSD) David Maidment (CRWR, UT-Austin) and other HIS development partners from UT-Austin, Utah State U, Drexel U, Duke U

2 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D The Grid is becoming the backbone for collaborative science and data sharing CI is about RE-USING data and research resources !!

3 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D CI Vision for Hydrologic Science Leverage ongoing cyberinfrastructure projects: Geosciences Network (GEON) Share data between Earth Disciplines Secure access to Grid resources, single sign-on authentication/ authorization, distributed data management, data publication, search, information integration, knowledge management, scientific workflows, archiving Integrate with common COTS (commercial off-the shelf) software: Excel, ArcGIS, Matlab… and Fortran … mostly on Windows… Interesting survey of CUAHSI partners by David Tarboton!

4 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D HIS User Assessment (Chapter 4 in Status Report) Data Access Science Observatory support Education Which of the four HIS goals is most important to you?

5 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Tuning to unique features of hydrology Hydrologic observations: Reliance on federally-organized data collection (NWIS, STORET, Ameriflux, etc.) with huge and complex nomenclatures  simplifying access to federal repositories  relatively lower emphasis on data ownership Handling time in both UTC and local Various spatial offsets Multiple data types: time series, fields, spatial data Integrative discipline: Interoperation with atmospheric, ocean, soils, geomorphology, social datasets and services… Community: Organized by “natural boundaries”  natural object hierarchy  networks of relatively autonomous self-managed data nodes Partnership with public sector water management ontologies

6 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Problems Microsoft and.NET vs Linux and J2EE Open source vs proprietary Free vs not free Open architecture, web services, well-defined interfaces

7 Main Components Web services for accessing hydrologic repositories Hydrologic Observations Data Model Hydrologic Data Access System + Time Series Viewer Collection of CUAHSI nodes NWIS ArcGIS Excel NCAR Unidata NASA Storet NCDC Ameriflux Matlab AccessSAS Fortran Visual Basic C/C++ CUAHSI Web Services

8 ExcelWeb browser Application services: analysis, mapping, charting, models, workflow, integration (8) Data registration/Search/ Query rewriting & orchestration(6) NAWQA STORET... Data Nodes External data resources registry metadata W e b s e r v i c e s r e g i s t r y a n d r e l a t e d s e r v i c e s ( 1 0 ) Hosted data services (5) Fortran/C/VB/Java codes Data Node Core grid services: monitoring nodes, scheduling, data transfer, replication, collection management,…(1) Resource drivers (2) Service consumers User registration/authentication/authorization (9) portal Sensors Sensor management services (3) Sensor data filtering (4) Ontology source and services (7) R Server ArcGIS Server Conversion engine Certificate authority Data Node 3 2 1

9 NWIS ArcGIS Excel NCAR Unidata NASA Storet NCDC Ameriflux Matlab AccessSAS Fortran Visual Basic C/C++ Some operational services CUAHSI Web Services Data Sources Applications Extract Transform Load http://www.cuahsi.org/his/

10 Database Sizes EPA NWS USGS Records 200 million ? Stations Time range 250 million 800,000100 years 1.5 million 100 years 19,000 (From Jon Goodall, Duke U.)

11 Language for Data Representation EPA NWS USGS Unique Identifier for a Observation Station site_no Station ID COOPID Latitude, Longitude Time of Measurement Station Latitude, Station Longitude Activity Start dec_lat_va, dec_long_va dv_dt YEAR,MO,DA,TIME LATITUDE, LONGITUDE Lots of semantic differences in parameter names, methods, etc.

12 Typical Example of Data Retrieval National Water Information System (NWIS)

13 Core Web Services ServiceInputOutput GetSites Obs Network, filterGet station codes in network GetSiteInfo Station CodeLat/long, station name GetVariables Obs Network or data source, filter Get variable codes GetVariableInfo Variable codeDescription of variable GetValues Station code or lat/long point, variable code, begin date, end date A time series of values GetChart As for GetValueA chart plotting the values

14 CUAHSI Web Services http://www.cuahsi.org/his/webservices.html NCEP North American Forecast Model 12 Km grid for continental US

15 CUAHSI Point Hydrologic Observations Data Model A relational database stored in Access, PostgreSQL, MS SQL Server, …. Stores observation data made at points Consistent format for storage of observations from many different sources and of many different types. Streamflow Flux tower data Precipitation & Climate Groundwater levels Water Quality Soil moisture data (D. Tarboton, USU) Community design requirements (22 reviewers)

16 Schema

17 1 1 CouplingTable WaterID (GUID) HydroID (Integer) MonitoringPoint WaterID HydroCode Name Latitude Longitude … Hydrologic Observations Data Model 1 1 OR Independent of, but coupled to Geographic Representation HODM Arc Hydro

18 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Uses and tools for HODM HODM is central to HIS infrastructure, but lacks tools Testing HODM with two types of data: federal repositories, and external databases (Panola). Personal and enterprise versions. Mapping wizard: loading Excel observation data to HODM database: Can save mapping files for subsequent runs of similarly formatted spreadsheets Local data analysis can be done: charts and stats HDAS as an interface to HODM datasets - but shall not be the only one - so exposing HODM as Web services

19 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Hydrologic Data Access System http://river.sdsc.edu/hdas/

20 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Hydrologic Data Access System

21 Cross-platform design Central CUAHSI HIS Node (Windows) GEON Data Node (Linux) Data Apache Tomcat IIS Web Server ASP.Net Geon Software Stack SQL Server Proxy ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Data Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Nodes (Windows)

22 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D HIS Scalability Adding… …data types and datasets; processing models and services; servers; users and roles – - shall not create unmanageable bottlenecks that require system re-engineering Designing for scalability: Distilling a generic set of web service signatures; resolving semantic and structural heterogenities Using HODM as a common generic format for time series data, for ease of coding and uniform search interfaces HDAS GUI design to abstract specifics of disparate repositories Leveraging common CI components developed in GEON Need to work with agencies to remove web services bottleneck

23 SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Future Work Updating and standardizing web services; services against additional repositories Adopting HODM for storing time series observations, and developing tools for loading data, querying, analyzing and visualizing data in HODM Finalizing the Windows-based CUAHSI Node, and preparing it for distribution, along with documentation “Digital Watershed” conceptualization


Download ppt "SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL REPOSITORIES OF HYDROLOGIC TIME SERIES Ilya Zaslavsky,"

Similar presentations


Ads by Google