Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tasking Sensor Networks Johannes Gehrke Cornell University Research Associate: Manuel Calimlim PhD Students:Rohit Ananthakrishna, Adina Costea, Abhinandan.

Similar presentations


Presentation on theme: "Tasking Sensor Networks Johannes Gehrke Cornell University Research Associate: Manuel Calimlim PhD Students:Rohit Ananthakrishna, Adina Costea, Abhinandan."— Presentation transcript:

1 Tasking Sensor Networks Johannes Gehrke Cornell University Research Associate: Manuel Calimlim PhD Students:Rohit Ananthakrishna, Adina Costea, Abhinandan Das, Alexandre Evfimievski, Manpreet Singh, Yong Yao

2 Background Characteristics of future battlespace environments and homeland defense monitoring systems: l Thousands or millions of small-scale sensor nodes l Nodes combine multiple sensing and computation capabilities l Limited resources at the sensors: Network, power, CPU Application requirements: l Scalability l Complex monitoring tasks, multiple user types, multiple missions, multiple systems l Survivability under stress and under attack l High-confidence in measured events and predictions l Easy deployment and zero-overhead administration

3 Flexible Decision Support Traditional Procedural addressing of individual sensor nodes; user specifies how task executes, data is processed centrally. SensIT Complex declarative querying and tasking. User isolated from “how the network works”, in- network distributed processing.

4 Querying: Model TimeValue 1282 1383 TimeValue 1382 1583 TimeValue 1382 1584 TimeValue 1382 1583 TimeValue 1380 1683 TimeValue 1479 1583

5 Example Queries l Snapshot queries: l What is the concentration of chemical X in the northeast quadrant? SELECT AVG(R.sensor.concentration) FROM Relation R WHERE R.sensor.loc in (50,50,100,100) l In which area is the concentration of chemical X higher than the average concentration? SELECT AVG(R.sensor.concentration) FROM Relation R GROUP BY R.area HAVING AVG(R.sensor.concentration) > (SELECT AVG(R.sensor.concentration) FROM Relation R GROUP BY R.area)

6 Example Queries (Contd.) l Long-running queries l Notify me over the next hour whenever the concentration of chemical X in an area is higher than my security threshold. SELECT R.sensor.area, AVG(R.sensor.concentration) FROM Relation R WHERE R.sensor.loc in rectangle GROUP BY R.sensor.area DURATION (now,now+3600) l Notify me if a TEL is driving south on Route 13 l Notify me if a TEL and a T72 cross l Archival queries l Periodic data collection for offline analysis

7 Goals l Declarative, high-level tasking l User is shielded from network characteristics l Changes in network conditions l Changes in power availability l Node movement l System optimizes resources l High-level optimization of multiple queries l Trade accuracy versus resource usage versus timeliness of query answer

8 Technical Challenges l Scale of the system l Constraints l Power l Communication l Computation l Constant change l Distribution and decentralization l Uncertainty from sensor measurements

9 Cornell Contributions l Scalable query processing architecture l High-level complex tasking (queries!) l Declarative XQuery-related high-level query language; can be generated directly from GUI l All-XML interfaces and communication structures l Sensor query processing l In-network query processing l Data stream processing l New probabilistic data model l Fault-tolerant adaptive query processing

10 Talk Outline l Querying sensor networks l Technical discussion l Scalable query processing architectures l High-level tasking l Sensor query processing l Outlook l Conclusions

11 The Cornell Cougar System:

12 The Cornell Cougar System Cougar in the (simplified) SensIT Architecture Diffusion Routing Query Proxy Signal Processing Diffusion Routing Proxy Server Higher-level tasking and analysis Frontend Node

13

14 Sample User Interface

15 High-Level Complex Tasking l Query language based on XQuery allows complex declarative tasking l User is shielded from physical network properties l GUI generates declarative queries l System optimizes queries, re-optimizes queries, adapts to physical network conditions

16 Sensor Query Processing l Data model l In-network processing l Records arrive in high-speed data streams l Environmental conditions are constantly changing

17 GADT: Relational Data for Sensors l We extended relational DBMSs with a new Gaussian data type. Gaussians are now first-class values.

18 Evaluating GADT Queries l GADT instances that satisfy a query can be simply visualized as a subset of the 2D plane l Example: R.a. Prob([-0.5,0.5])>0.1 l We can use database indexing techniques to process such queries

19 GADT Data Type Implementation level: GADT Operators l Selection l Projection l Join Conceptual level: Theory l Measure-theoretic formulation of probabilistic data l New framework for probabilistic data

20 Sensor Query Processing l Data model l In-network processing l Records arrive in high-speed data streams l Environmental conditions are constantly changing

21 In-Network Processing What is distributed in-network processing? l Processing at the nodes where the data originates (the “source nodes”) l Processing at “intermediate nodes” l Processing only at relevant nodes Why is this hard? l Scale l Constantly changing conditions l Meta-data management l Fault tolerance

22 Processing at Intermediate Nodes (1) l Onto which nodes should we place query processing operators?

23 Processing at Intermediate Nodes (2) l Several new aggregation algorithms that make use of processing at the intermediate nodes l Simple spanning tree aggregation l Fault-tolerant super-node spanning tree aggregation l Simulation results and results from working implementation: l Reduces network traffic l Increases battery life of the nodes l Scales gracefully with number of queries and number of nodes

24 Distributed Processing l Switch intermediate processing based on available power.

25 In-Network Data Stream Processing l Examples: l Quantiles with limited memory What was the median concentration of chemical X in this area over the last five minutes? l Correlated aggregates with limited memory During the last five minutes, where was the concentration of chemical X in this area higher than the average?

26 In-Network Data Stream Processing l Why are aggregates with limited memory hard? Our solution: l Hierarchical, distributed algorithm, provable approximation guarantees, limited amount of memory.

27 In-Network Processing

28 Example 1: Distributed Data Streams Simple example: How many detections match? Techniques: l k-wise independent random variables l Histograms l Other statistical techniques

29 Example 2: Change Detection Approach 1: Define difference metric (“deviation”) at the data mining model level. Compare datasets through difference in the data mining models they induce. Approach 2: Mine hidden concepts from data streams. Monitor change of concepts. 0 0 1 0… 1 1 0 1… 0 1 1 1… 0 1 0 1… 1 0 0 0… 0 1 1 1… 1 0 0 1… 1 0 1 0… 0 1 1 0… 1 0 1 1… 0 0 1 1… 0 1 1 1… 1 1 0 1…

30 Talk Outline l Querying sensor networks l Technical discussion l Scalable query processing architectures l High-level tasking l Sensor query processing l Outlook l Conclusions

31 Where Do We Stand? November 2000: l Demonstrated basic query processing in November 2000 experiments l Integrated with ISI diffusion routing l Motivated major component of filters in diffusion API for in- network processing l Demonstration at Intel Continuum Computing Conference November 2001/January 2002: l Developmental demo of query processing system at 29 Palms in November 2001 (integrated with ISI diffusion) l Experimental demo for January 2002 PI meeting: Integrated with ISI diffusion routing, BAE Systems, Fantastic Data, ISI- West l Integration work for AFRL

32 Publications Since Last PI Meeting l V. Ganti, J. Gehrke, R. Ramakrishnan, and W.-Y. Loh. A Framework for Change Detection. Journal of Computer and Systems Science, 2001. l J. Gehrke, F. Korn, and D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. 2001 ACM SIGMOD Conference. l Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. 2001 ACM SIGMOD Conference. l T. Faradjian, J. Gehrke, and P. Bonnet. GADT: A Probability Space ADT For Representing and Querying the Physical World. 2002 IEEE ICDE Conference. Under submission: l Computing Complex Aggregates over Data Streams l Which Aggregates Cannot be Approximated Well Over Data Streams? l Adaptive Query Processing in Heterogeneous Environments l A Framework for Physical Database Design l Least Expected Cost Query Optimization

33 Impact What will be the impact on national security and DoD? l Continuous intelligence gathering at several orders larger magnitude l Fast event notification l High-level programming interface (“queries”) l Establish a system infrastructure for sensor networks community Integrate query processing into system infrastructure (embedded monitoring)

34 Plans for Remainder of Contract l Participation in large-scale experimental demo l Demonstrate: l Reduced network traffic and reduced energy usage through in-network processing l Scalability with number of nodes l Scalability with number of queries

35 Outlook l Multi-query optimization l Triggers l Historical and predictive queries l Information assurance l Internetworking for homeland defense

36 Multi-Query Optimization l Scenario: Multiple related, but slightly different queries l Goal: Save power and communication l Challenge: Combining multiple queries, finding common query parts SQL Queries Query 1 Query 2 Users: Physical network structure is transparent Sensor Network: Executes user queries Cougar Data Server Front-ends

37 In-Network Geo-Spatial Triggers l Database concepts: l Condition (“I am tracking a T-80”) l Event (“It enters the northeast battle zone”) l Action (“Turn on the cameras and alert commander”) l Why is this hard? l Current database systems choke at tens of triggers l Here we will have >100,000 personal triggers l Technical Challenges: l In-network trigger management l Consistency of triggers l Scalability

38 Predictive Queries l How many vehicles went by between 0600 and 0800? l When is the vehicle going to reach the intersection? l Technical challenges: l On-node distributed query processing and storage l Efficient compression of past events l Memory management and background archival l Prediction models right at the nodes

39 Information Assurance l Sensor failure l How do we know about broken sensors? l How to we compensate for broken sensors? l Can we predict sensor replacement needs? l Zero administration l 24/7/365 system uptime l Sensor placement l Where should we place sensors? l Redundancy versus accuracy versus resource usage

40 Internetworking for Homeland Defense l Integrate the physical world with other intelligence l The loop goes both ways  Open architectures, standard data and knowledge exchange (XML-based) l Technical challenges: l Seamless fixed/mobile device interaction l Data integration l Scalability – both number of nodes and amount of data collected l Knowledge discovery and data mining

41 Summary Distributed, highly scalable, fault-tolerant, energy-efficient query processing techniques that scale to large number of nodes and queries and works under tight resource constraints at the nodes.

42 Questions? http://www.cs.cornell.edu/database/cougar The Cougar Team: Manuel Calimlim (Research Associate), Rohit Ananthakrishna, Zhiyuan Chen, Abhinandan Das, Alexandre Evfimievski, Yong Yao (PhD students) SQL Queries Query 1 Query 2 Users: Physical network structure is transparent Sensor Network: Executes user queries Cougar Data Server Front-ends


Download ppt "Tasking Sensor Networks Johannes Gehrke Cornell University Research Associate: Manuel Calimlim PhD Students:Rohit Ananthakrishna, Adina Costea, Abhinandan."

Similar presentations


Ads by Google