Presentation on theme: "SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack."— Presentation transcript:
SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez
SAN DIEGO SUPERCOMPUTER CENTER USGS Instantaneous water data services 15 minute intervals 10,000+ sites (7,000+ have dischage) Upto 60 days of data available Data provided using CUAHSI WaterML
SAN DIEGO SUPERCOMPUTER CENTER Open Source Data Turbine ( Ring Buffered Network Bus) DataTurbine is a robust open-source streaming data middleware system, designed for sensor based systems. Co-developed by our UCSD / Calit2 colleagues. Solution for accessing both streaming and static data, from different vendor systems, via a common interface. Released under Apache 2.0 Open Source License Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec
SAN DIEGO SUPERCOMPUTER CENTER Open Source DataTurbine Supported by NASA SBIR, 15 years in development Supports multiple types of streams: real-time monitoring, video and multimedia, telemetry, instant messages, etc. etc. Scalable: DataTurbine servers can be interconnected to handle large streams Can manipulate the streams: fast forward or slow motion playback (TiVo-like)
SAN DIEGO SUPERCOMPUTER CENTER Goal of Integrating Data Turbine with CUAHSI HIS Get the two systems to work together. Maintain an up-to-date view of a large volume of near real time data, in house. Store data locally beyond the 60 days it is made available. Enable viewing of the NWIS Instantaneous data in the Realtime Data Viewer (RDV).
SAN DIEGO SUPERCOMPUTER CENTER Challenges of Project Integrate CUAHSI HIS with the data turbine CUAHIS HIS perspective: Consuming waterML from Java environment Obtain and store NWIS 15 minute data beyond 60 days. Data Turbine Perspective Cuahsi data represented unusual challenges –Pulling data. –Time stamps have to set for each value. 7,000 “Channels” needed to be organized for the RDV client –Visualizing / navigating mass volumes of data.
SAN DIEGO SUPERCOMPUTER CENTER CUAHSI –> Data Turbine
SAN DIEGO SUPERCOMPUTER CENTER OSDT Custom Source Each source is a separate connection 7000 sources was too many for OSDT. Sources can have multiple channels and sub- channels Sites were organized by state and county to make it navigatible 50GB Disk cache: ~ 1 year of 15 minute data for 7000 sites. Cycling through 7,000+ getValues request takes ~18 hours for the iteration, or upon restart. Subsequent iterations still can complete in under 8 hours.
SAN DIEGO SUPERCOMPUTER CENTER Realtime Data Viewer (RDV)
SAN DIEGO SUPERCOMPUTER CENTER OSDT Custom “Sink” Is essentially a custom client connection to DataTurbine (RDV is a sink process). Pulls data and writes it to SQL batch files for batch inserts. Used to update local ODM instance of NWIS instantaneous data.
SAN DIEGO SUPERCOMPUTER CENTER Conclusions CUAHSI HIS WaterML can be used in Java/ non windows environments successfully. Displaying near realtime data in RDV is very fast and is a valuable visualization tool. Data turbine is designed to ingest much more data than this. Capable of 10MB/Second – We’re feeding it < 1K/second. Updating data channels worked, but is well beyond what the OSDT developers had in mind when designing it. Organizing channels in a viewer display represents organizational challenges.