Presentation on theme: "Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA."— Presentation transcript:
Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA
Data Provenance in Remote Environmental Monitoring (REM) REM = automated collection of data from the natural environment in remote settings. Central points: Data provenance is fundamental to REM. Data source, times, ownership are intrinsic. REM hardware and software architectures pose unique challenges for establishing provenance. Heterogeneous, distributed, low-power systems.
Outline Two REM case studies and problem statements: 1. Snowpack monitoring (SnowMAN) The SnowMAN project summary. Microcosmic provenance issues, challenges. SnowMAN provenance coping mechanisms. 2. Sagehen Creek Field Station network Overview of project setting. Macrocosmic provenance issues, challenges. Possible approaches to central challenges.
How Much Snow is Out There? Snow/Water Equivalent (SWE): measurement of water content in snowpack Not the same as snow height.
How Much Snow is Out There? Regional snowpack profiles are critically important to natural resource planning, public safety. Real world measurement is complicated by terrain, forest canopies, wind, exposure. Accurate realtime SWE measurement is a holy grail of REM.
The UVM SnowMAN Project A new approach to SWE measurement Use modern computer technology for data acquisition and retrieval A multi-modal approach to SWE approximation Lightweight, low cost, robust, adaptable Improved spatial and temporal resolution
Multimodal Sensor Fusion Algorithms on sensing nodes combine multiple sensing technologies of variable power cost: 1. Snow height via ultrasound (cheap) 2. Snow density via microwave absorption (moderate) 3. Snow density via gamma ray attenuation (expensive)
SnowMAN System Architecture Multiple data gathering-and-processing nodes connected via a Wireless Sensor Network (WSN) Arduino-based on-site gateway provides datalogging via SD card, data processing Remote data retrieval via TCP/IP over cellmodem
Provenance Issues in SnowMAN Data reported by sensors meaningless without provenance information: Time of sampling event Location of sample Type and ADC conversion formula of sensor Refinement of multimodal fusion algorithm requires history/cause of sampling event.
Provenance Challenges in SnowMAN Low-bandwidth requirements in WSNs Messages must be small, infrequent. Volatility of low-cost devices WSN node failures require data reliability solutions Heterogeneous network architecture Data formats must be converted in network communications Time synchronization
Managing Provenance in SnowMAN Reliability ensured by datalogging on gateway, replication within WSN. Requires data source, time to be stored with readings. Provenance information reported with data readings. Component of packet format; not onerously large. Data converted at protocol boundaries. 802.15.4 to RS232 to TCP/IP to SQL. Time synchronization handled by simple protocols. Low precision sufficient; cellmodem provides true time.
Outstanding Provenance Issues in SnowMAN How to verify that data is converted properly at protocol boundaries? How to encode history of multi-modal readings, for analysis and refinement of algorithms? How to detect errors in data readings, due to sensor, time synchronization, node failure?
REM in Macrocosm: Sagehen Creek Field Station Sagehen Creek Field Station and Experimental Forest located near Truckee, CA Research and Teaching Facility of UC Berkeley 9,000 acres of undisturbed wilderness, extensive REM technology
REM in Macrocosm: Sagehen Creek Field Station Literally hundreds of various sensor devices Temperature, wind, humidity Streamflow, Stream temperature Snow height, SWE Video 9 hubs with (programmable) dataloggers, power, wireless transmission Goal: wireless connectivity to field house and internet, off-site data warehousing Multiple user, administration groups
Provenance Issues at Sagehen Inherits microcosmic issues (time, location, sensor modality essential to data). Video triggering events should be reported. Group data ownership now important to report (and maintain through data cycle). Sagehen provenance should be credited in myriad end-uses of data. Diagnostics of network functionality and services.
Provenance Challenges at Sagehen Inherits microcosmic challenges, but: Increased sampling rates, network traffic Time synchronization much more complex GPS auto-location for some sensors, manual for others Much greater diversity of devices, communications mediums (wired, wireless) More protocol boundaries Multimedia
Sagehen Provenance Issues: Scalability Sagehen network modeled as source-to-sink dataflow, from sensors to end-users. Sources extensible by user groups New sensors, sensor networks (e.g. WSNs) New remote datalogging/replication architecture Sink usable by end-user groups Arbitrary visualization technologies Diverse research and education applications
Sagehen Network: The Current Reality Establishing data communications backbone over IEEE802.11 wireless LAN. Limited data collection over network (one- hop) via canned proprietary software. Most data collection being done manually from dataloggers. Sensors hardwired to dataloggers, no WSNs in the field. Some one-hop connectivity between hubs.
Sagehen Network: The Vision Seamless source-to-sink dataflow. From sensors in the field to off-site, permanent data warehouse. Also accessible onsite at remote hubs (reliable). Wireless sensor network capabilities in the field. Attribution of data to source groups and Sagehen. Easy extensibility of network at source end, to allow addition of new sensors (and WSNs).
Some Ideas for Supporting Provenance in the Sagehen Software Architecture Treating data like messages on a protocol stack. Stack defined across device (protocol) boundaries: Sensor data is raw, collects more provenance information as it moves towards the sink. Higher layers of provenance (time, ownership) encapsulate lower layers. Allows compositional (principled) treatment of cross-protocol data transformation.
Some Ideas for Supporting Provenance in the Sagehen Software Architecture Watermarking data to establish Sagehen and group ownership. Easily done for video media. Video retrieved only from the internet; watermarking performed on traditional platform. Watermarking sensor data?? Need to preserve data may not tolerate traditional techniques. In-the-field retrieval requires in-the-field watermarking.
Conclusion Remote environmental monitoring requires provenance for correct interpretation of data. REM networks heterogeneous, some components computationally weak. Power, cost restrictions. Protocol hodgepodge! Adapting to REM environment a unique challenge for provenance in software.
Conclusion Two case studies: SnowMAN: lightweight, low cost SWE monitoring. Sagehen Creek Field Station: REM in macrocosm. http:www.cs.uvm.edu/~skalka http://sagehen.ucnrs.org/