Download presentation
Presentation is loading. Please wait.
Published byTrevor Knight Modified over 8 years ago
1
Tasking Sensor Networks Johannes Gehrke Cornell University Research Associate: Manuel Calimlim PhD Students:Rohit Ananthakrishna, Adina Costea, Abhinandan Das, Alexandre Evfimievski, Manpreet Singh, Yong Yao
2
Background Characteristics of future battlespace environments and homeland defense monitoring systems: l Thousands or millions of small-scale sensor nodes l Nodes combine multiple sensing and computation capabilities l Limited resources at the sensors: Network, power, CPU Application requirements: l Scalability l Complex monitoring tasks, multiple user types, multiple missions, multiple systems l Survivability under stress and under attack l High-confidence in measured events and predictions l Easy deployment and zero-overhead administration
3
Flexible Decision Support Traditional Procedural addressing of individual sensor nodes; user specifies how task executes, data is processed centrally. SensIT Complex declarative querying and tasking. User isolated from “how the network works”, in- network distributed processing.
4
Querying: Model TimeValue 1282 1383 TimeValue 1382 1583 TimeValue 1382 1584 TimeValue 1382 1583 TimeValue 1380 1683 TimeValue 1479 1583
5
Example Queries l Snapshot queries: l What is the concentration of chemical X in the northeast quadrant? SELECT AVG(R.sensor.concentration) FROM Relation R WHERE R.sensor.loc in (50,50,100,100) l In which area is the concentration of chemical X higher than the average concentration? SELECT AVG(R.sensor.concentration) FROM Relation R GROUP BY R.area HAVING AVG(R.sensor.concentration) > (SELECT AVG(R.sensor.concentration) FROM Relation R GROUP BY R.area)
6
Example Queries (Contd.) l Long-running queries l Notify me over the next hour whenever the concentration of chemical X in an area is higher than my security threshold. SELECT R.sensor.area, AVG(R.sensor.concentration) FROM Relation R WHERE R.sensor.loc in rectangle GROUP BY R.sensor.area DURATION (now,now+3600) l Notify me if a TEL is driving south on Route 13 l Notify me if a TEL and a T72 cross l Archival queries l Periodic data collection for offline analysis
7
Goals l Declarative, high-level tasking l User is shielded from network characteristics l Changes in network conditions l Changes in power availability l Node movement l System optimizes resources l High-level optimization of multiple queries l Trade accuracy versus resource usage versus timeliness of query answer
8
Technical Challenges l Scale of the system l Constraints l Power l Communication l Computation l Constant change l Distribution and decentralization l Uncertainty from sensor measurements
9
Cornell Contributions l Scalable query processing architecture l High-level complex tasking (queries!) l Declarative XQuery-related high-level query language; can be generated directly from GUI l All-XML interfaces and communication structures l Sensor query processing l In-network query processing l Data stream processing l New probabilistic data model l Fault-tolerant adaptive query processing
10
Talk Outline l Querying sensor networks l Technical discussion l Scalable query processing architectures l High-level tasking l Sensor query processing l Outlook l Conclusions
11
The Cornell Cougar System:
12
The Cornell Cougar System Cougar in the (simplified) SensIT Architecture Diffusion Routing Query Proxy Signal Processing Diffusion Routing Proxy Server Higher-level tasking and analysis Frontend Node
14
Sample User Interface
15
High-Level Complex Tasking l Query language based on XQuery allows complex declarative tasking l User is shielded from physical network properties l GUI generates declarative queries l System optimizes queries, re-optimizes queries, adapts to physical network conditions
16
Sensor Query Processing l Data model l In-network processing l Records arrive in high-speed data streams l Environmental conditions are constantly changing
17
GADT: Relational Data for Sensors l We extended relational DBMSs with a new Gaussian data type. Gaussians are now first-class values.
18
Evaluating GADT Queries l GADT instances that satisfy a query can be simply visualized as a subset of the 2D plane l Example: R.a. Prob([-0.5,0.5])>0.1 l We can use database indexing techniques to process such queries
19
GADT Data Type Implementation level: GADT Operators l Selection l Projection l Join Conceptual level: Theory l Measure-theoretic formulation of probabilistic data l New framework for probabilistic data
20
Sensor Query Processing l Data model l In-network processing l Records arrive in high-speed data streams l Environmental conditions are constantly changing
21
In-Network Processing What is distributed in-network processing? l Processing at the nodes where the data originates (the “source nodes”) l Processing at “intermediate nodes” l Processing only at relevant nodes Why is this hard? l Scale l Constantly changing conditions l Meta-data management l Fault tolerance
22
Processing at Intermediate Nodes (1) l Onto which nodes should we place query processing operators?
23
Processing at Intermediate Nodes (2) l Several new aggregation algorithms that make use of processing at the intermediate nodes l Simple spanning tree aggregation l Fault-tolerant super-node spanning tree aggregation l Simulation results and results from working implementation: l Reduces network traffic l Increases battery life of the nodes l Scales gracefully with number of queries and number of nodes
24
Distributed Processing l Switch intermediate processing based on available power.
25
In-Network Data Stream Processing l Examples: l Quantiles with limited memory What was the median concentration of chemical X in this area over the last five minutes? l Correlated aggregates with limited memory During the last five minutes, where was the concentration of chemical X in this area higher than the average?
26
In-Network Data Stream Processing l Why are aggregates with limited memory hard? Our solution: l Hierarchical, distributed algorithm, provable approximation guarantees, limited amount of memory.
27
In-Network Processing
28
Example 1: Distributed Data Streams Simple example: How many detections match? Techniques: l k-wise independent random variables l Histograms l Other statistical techniques
29
Example 2: Change Detection Approach 1: Define difference metric (“deviation”) at the data mining model level. Compare datasets through difference in the data mining models they induce. Approach 2: Mine hidden concepts from data streams. Monitor change of concepts. 0 0 1 0… 1 1 0 1… 0 1 1 1… 0 1 0 1… 1 0 0 0… 0 1 1 1… 1 0 0 1… 1 0 1 0… 0 1 1 0… 1 0 1 1… 0 0 1 1… 0 1 1 1… 1 1 0 1…
30
Talk Outline l Querying sensor networks l Technical discussion l Scalable query processing architectures l High-level tasking l Sensor query processing l Outlook l Conclusions
31
Where Do We Stand? November 2000: l Demonstrated basic query processing in November 2000 experiments l Integrated with ISI diffusion routing l Motivated major component of filters in diffusion API for in- network processing l Demonstration at Intel Continuum Computing Conference November 2001/January 2002: l Developmental demo of query processing system at 29 Palms in November 2001 (integrated with ISI diffusion) l Experimental demo for January 2002 PI meeting: Integrated with ISI diffusion routing, BAE Systems, Fantastic Data, ISI- West l Integration work for AFRL
32
Publications Since Last PI Meeting l V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh. A Framework for Change Detection. Journal of Computer and Systems Science, 2001. l J. Gehrke, F. Korn, and D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. 2001 ACM SIGMOD Conference. l Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. 2001 ACM SIGMOD Conference. l T. Faradjian, J. Gehrke, and P. Bonnet. GADT: A Probability Space ADT For Representing and Querying the Physical World. 2002 IEEE ICDE Conference. Under submission: l Computing Complex Aggregates over Data Streams l Which Aggregates Cannot be Approximated Well Over Data Streams? l Adaptive Query Processing in Heterogeneous Environments l A Framework for Physical Database Design l Least Expected Cost Query Optimization
33
Impact What will be the impact on national security and DoD? l Continuous intelligence gathering at several orders larger magnitude l Fast event notification l High-level programming interface (“queries”) l Establish a system infrastructure for sensor networks community Integrate query processing into system infrastructure (embedded monitoring)
34
Plans for Remainder of Contract l Participation in large-scale experimental demo l Demonstrate: l Reduced network traffic and reduced energy usage through in-network processing l Scalability with number of nodes l Scalability with number of queries
35
Outlook l Multi-query optimization l Triggers l Historical and predictive queries l Information assurance l Internetworking for homeland defense
36
Multi-Query Optimization l Scenario: Multiple related, but slightly different queries l Goal: Save power and communication l Challenge: Combining multiple queries, finding common query parts SQL Queries Query 1 Query 2 Users: Physical network structure is transparent Sensor Network: Executes user queries Cougar Data Server Front-ends
37
In-Network Geo-Spatial Triggers l Database concepts: l Condition (“I am tracking a T-80”) l Event (“It enters the northeast battle zone”) l Action (“Turn on the cameras and alert commander”) l Why is this hard? l Current database systems choke at tens of triggers l Here we will have >100,000 personal triggers l Technical Challenges: l In-network trigger management l Consistency of triggers l Scalability
38
Predictive Queries l How many vehicles went by between 0600 and 0800? l When is the vehicle going to reach the intersection? l Technical challenges: l On-node distributed query processing and storage l Efficient compression of past events l Memory management and background archival l Prediction models right at the nodes
39
Information Assurance l Sensor failure l How do we know about broken sensors? l How to we compensate for broken sensors? l Can we predict sensor replacement needs? l Zero administration l 24/7/365 system uptime l Sensor placement l Where should we place sensors? l Redundancy versus accuracy versus resource usage
40
Internetworking for Homeland Defense l Integrate the physical world with other intelligence l The loop goes both ways Open architectures, standard data and knowledge exchange (XML-based) l Technical challenges: l Seamless fixed/mobile device interaction l Data integration l Scalability – both number of nodes and amount of data collected l Knowledge discovery and data mining
41
Summary Distributed, highly scalable, fault-tolerant, energy-efficient query processing techniques that scale to large number of nodes and queries and works under tight resource constraints at the nodes.
42
Questions? http://www.cs.cornell.edu/database/cougar The Cougar Team: Manuel Calimlim (Research Associate), Rohit Ananthakrishna, Zhiyuan Chen, Abhinandan Das, Alexandre Evfimievski, Yong Yao (PhD students) SQL Queries Query 1 Query 2 Users: Physical network structure is transparent Sensor Network: Executes user queries Cougar Data Server Front-ends
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.