Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 15 Sensor Databases & Data Stream Systems Phil Gibbons.

Similar presentations


Presentation on theme: "Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 15 Sensor Databases & Data Stream Systems Phil Gibbons."— Presentation transcript:

1 Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 15 Sensor Databases & Data Stream Systems Phil Gibbons March 4, 2003

2 Lecture 15 03-04-03 2Outline Sensor Databases Madden et al, “The Design of an Acquisitional Query Processor for Sensor Networks”, to appear in Sigmod’03 Data Stream Systems Babcock et al, “Models and Issues in Data Stream Systems”, PODS’02 survey talk

3 Lecture 15 03-04-03 3 “The Design of an Acquisitional Query Processor for Sensor Networks” Latest paper on the TinyDB work out of U.C. Berkeley & Intel Research Berkeley Goal was to have you study the very latest research on sensor databases Preliminary version. A more polished, camera-ready version will be available March 11 – I will post it. Thanks to Sam Madden for providing slides that I have adapted for use in part of this lecture. Note: He is interviewing at CMU in April. What did you think of this paper?

4 Lecture 15 03-04-03 4 Acquisitional Query Processing What’s really new & different about (mote-based) sensor networks? This paper’s answer: Long running queries on physically embedded devices that control when and where and with what frequency data is collected Versus traditional systems where data is provided a priori For a distributed, embedded sensing environment, ACQP provides a framework for addressing issues of When, where, and how often data is sensed/sampled Which data is delivered

5 Lecture 15 03-04-03 5 Context: Mica Motes Tiny Memory 4KB RAM 128KB program memory Limited Communication Broadcast to any that hear it. Form ad-hoc routing tree ~Ten 48-byte messages delivered per second Power consumption Every bit of data transmitted by radio = 1000 CPU insts Deep sleeping is 4-10 times less power than when active Can synchronize clocks with neighboring motes to within +/- 1 millisec: Ensure all awake at roughly the same time

6 Lecture 15 03-04-03 6 Acquisitional Query Processing How does the user control acquisition? Rates or lifetimes Event-based triggers How should the query be processed? Sampling as an operator, Power-optimal ordering Frequent events as joins Which nodes have relevant data? Semantic Routing Tree for effective pruning Nodes that are queried together route together Which samples should be transmitted? Pick most “valuable”? Adaptive transmission & sampling rates Adapted from slides ©Sam Madden

7 Lecture 15 03-04-03 7 Rate & Lifetime Queries Rate query SELECT nodeid, light, temp FROM sensors SAMPLE INTERVAL 1s FOR 10s Lifetime query SELECT … LIFETIME 30 days May not be able to transmit all the data Estimate sampling rate that achieves this SELECT … LIFETIME 10 days MIN SAMPLE INTERVAL 1s Adapted from slides ©Sam Madden

8 Lecture 15 03-04-03 8 Processing Lifetimes: Issues Provide formulas for estimating power consumption: set maximum per-node sampling rates What makes this difficult? multiple sensing types (temp, accel) with different drain estimating the selectivity of predicates amount transmitted by a node varies widely root is a bottleneck: all nodes rates must correspond to it aggregation vs. sending individual values conditions change: multiple queries, burstiness, message losses What to do when can’t transmit all the data Adapted from slides ©Sam Madden

9 Lecture 15 03-04-03 9 Lifetime Based Queries Is this experiment convincing? Adapted from slides ©Sam Madden

10 Lecture 15 03-04-03 10 Event Based Processing ACQP – want to initiate queries in response to events ON EVENT bird-detect(loc): SELECT AVG(s.light), AVG(s.temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE PERIOD 2s FOR 30s Reports the average light and temperature level at sensors near a bird nest where a bird has been detected What are the issues here? E.g., New query instance generated for as long as bird is there Adapted from slides ©Sam Madden

11 Lecture 15 03-04-03 11 Event Based Processing Single external interrupt Adapted from slides ©Sam Madden

12 Lecture 15 03-04-03 12 Acquisitional Query Processing How does the user control acquisition? Rates or lifetimes Event-based triggers How should the query be processed? Sampling as an operator, Power-optimal ordering Frequent events as joins Which nodes have relevant data? Semantic Routing Tree for effective pruning Nodes that are queried together route together Which samples should be transmitted? Pick most “valuable”? Adaptive transmission & sampling rates Adapted from slides ©Sam Madden

13 Lecture 15 03-04-03 13 Power-Optimal Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s Energy cost of sampling mag >> cost of sampling light 1500 uJ vs. 90 uJ Correct ordering (unless pred1 is very selective): 2. Sample light Apply pred2 Sample mag Apply pred1 1. Sample light Sample mag Apply pred1 Apply pred2 3.Sample mag Apply pred1 Sample light Apply pred2 Adapted from slides ©Sam Madden

14 Lecture 15 03-04-03 14 Event Query Batching ON EVENT E(nodeid) SELECT a FROM sensors AS s WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k Problem: Multiple outstanding queries (lots of samples) SELECT s.a FROM sensors AS s, events AS e WHERE s.nodeid = e.nodeid AND e.type = E AND s.time – e.time e.time SAMPLE INTERVAL d Solution: Rewrite as a sliding window join between sensors and the last k seconds of detected events: If events are frequent, use join approach. Issues? Assumes regular occurrences: Would like to handle burstiness Adapted from slides ©Sam Madden

15 Lecture 15 03-04-03 15 Acquisitional Query Processing How does the user control acquisition? Rates or lifetimes Event-based triggers How should the query be processed? Sampling as an operator, Power-optimal ordering Frequent events as joins Which nodes have relevant data? Semantic Routing Tree for effective pruning Nodes that are queried together route together Which samples should be transmitted? Pick most “valuable”? Adaptive transmission & sampling rates Adapted from slides ©Sam Madden

16 Lecture 15 03-04-03 16 Attribute Driven Topology Selection Observation: internal queries often over local area Or some other subset of the network E.g. regions with light value in [10,20] Idea: build topology for those queries based on values of range-selected attributes For range queries Relatively static trees Maintenance Cost Adapted from slides ©Sam Madden

17 Lecture 15 03-04-03 17 Attribute Driven Query Propagation 123 4 [1,10] [7,15] [20,40] SELECT … WHERE a > 5 AND a < 12 Precomputed intervals = Semantic Routing Tree (SRT) Early pruning Adapted from slides ©Sam Madden

18 Lecture 15 03-04-03 18 Attribute Driven Parent Selection 123 4 [1,10] [7,15] [20,40] [3,6] [3,6]  [1,10] = [3,6] [3,6]  [7,15] = ø [3,6]  [20,40] = ø Even without intervals, expect that sending to parent with closest value will help Adapted from slides ©Sam Madden

19 Lecture 15 03-04-03 19 Simulation Result Random Parent Adapted from slides ©Sam Madden

20 Lecture 15 03-04-03 20 Acquisitional Query Processing How does the user control acquisition? Rates or lifetimes Event-based triggers How should the query be processed? Sampling as an operator, Power-optimal ordering Frequent events as joins Which nodes have relevant data? Semantic Routing Tree for effective pruning Nodes that are queried together route together Which samples should be transmitted? Pick most “valuable”? Adaptive transmission & sampling rates Adapted from slides ©Sam Madden

21 Lecture 15 03-04-03 21 Adaptive Transmission Rates Adaptive = 2x % Successful Xmissions TinyDB monitors channel contention & backs-off as needed Adapted from slides ©Sam Madden

22 Lecture 15 03-04-03 22 Prioritizing Data Delivery Score each item Send largest score Out of order -> Priority Queue Discard or aggregate when buffer is full [1,2] Adapted from slides ©Sam Madden

23 Lecture 15 03-04-03 23 Choosing Data To Send Delta encoding [1,2] (time, value) Adapted from slides ©Sam Madden

24 Lecture 15 03-04-03 24 Choosing Data To Send [2,6] [3,15] [4,1] [1,2] |2-6| = 4 |2-15| = 13 |2-4| = 2 Delta encoding Select which of the 3 to send Adapted from slides ©Sam Madden

25 Lecture 15 03-04-03 25 Choosing Data To Send [2,6] [3,15] [4,1] [1,2] |2-6| = 4 |15-4| = 11 Delta encoding Keep selecting until hit max delivery rate Adapted from slides ©Sam Madden

26 Lecture 15 03-04-03 26 Choosing Data To Send [2,6] [3,15] [4,1] [1,2] Delta encoding Adapted from slides ©Sam Madden

27 Lecture 15 03-04-03 27 Choosing Data To Send [2,6] [3,15] [4,1] [1,2] Delta encoding If manage to send all Adapted from slides ©Sam Madden

28 Lecture 15 03-04-03 28 Delta + Adaptivity 8 element queue 4 motes transmitting different signals 8 samples /sec / mote Adapted from slides ©Sam Madden

29 Lecture 15 03-04-03 29 ACQP Summary Lifetime & event based queries User preferences for when data is acquired Optimizations for Order of sampling Events vs. joins Semantic Routing Tree Query dissemination Runtime prioritization Adaptive rate control Which samples to send Adapted from slides ©Sam Madden

30 Lecture 15 03-04-03 30Outline Sensor Databases Madden et al, “The Design of an Acquisitional Query Processor for Sensor Networks”, to appear in Sigmod’03 Data Stream Systems Babcock et al, “Models and Issues in Data Stream Systems”, PODS’02 survey talk

31 Lecture 15 03-04-03 31 Models & Issues in Data Stream Systems Invited survey paper to PODS 2002 Good overview of the basics & the issues But with a definite Stanford bias Data arrives in multiple, continuous, rapid, time-varying data streams Can have continuous queries Data Stream Management Systems What did you think of this paper? This part of the lecture does not follow the paper

32 Lecture 15 03-04-03 32 Data Stream Systems Introduction Research in Synopses for Data Streams (models, algorithms, lower bounds) Research in Data Stream Management Systems

33 Lecture 15 03-04-03 33 Processing Data Streams: Motivation Many applications generate streams of data Performance measurements in network monitoring and traffic management Call detail records in telecommunications Transactions in retail chains, ATM operations in banks Log records generated by Web Servers Sensor network data Application characteristics Massive volumes of data (several terabytes) Records arrive at a rapid rate Goal: Mine patterns, process queries and compute statistics on data streams in real-time Adapted from slides ©Rajeev Motwani

34 Lecture 15 03-04-03 34 Example: Network Management Network Operations Center Network Measurements Alarms Massive amounts of rapidly-arriving data at each node Adapted from slides ©Rajeev Motwani

35 Lecture 15 03-04-03 35 Data Stream Systems Introduction Research in Synopses for Data Streams (models, algorithms, lower bounds) Research in Data Stream Management Systems

36 Lecture 15 03-04-03 36 Data Stream Model A data stream is a sequence of elements: Stream processing goals Limited memory for storing synopsis, e.g., O(  n), O(log n) Fast synopsis update time (per element), e.g., O(1) Fast query time, e.g., O(  n), O(log n) Stream Processing Engine (Approximate) Answer Synopsis in Memory Data Stream Adapted from slides ©Rajeev Motwani

37 Lecture 15 03-04-03 37 Merged Data Streams Model Stream Processing Engine (Approximate) Answer Synopsis in Memory Data Streams Multiple data streams to a single party/agent Arbitrary interleaving of streams Same goals as before (per stream) Adapted from slides ©Rajeev Motwani

38 Lecture 15 03-04-03 38 Distributed Data Streams Model Stream Processing Engine Synopsis in Memory Data Stream Stream Processing Engine Synopsis in Memory Data Stream When a query is requested Analysis Front End (Approximate) Answer + Avoids sending streams to Analysis Front End [G, Tirthapura, SPAA’01]......

39 Lecture 15 03-04-03 39 Adversarial Stream Inputs Adversary controls input values and order No distributional assumptions on the inputs Past may not be representative of the future Typically, do know the input domain Randomized algorithms Have oracle for uniformly random numbers Would like to minimize the number of oracle calls Adversary does not adapt to these random numbers

40 Lecture 15 03-04-03 40 Coping With Memory Limitations Many queries cannot be answered over streams, due to the memory limitations e.g., see proofs in [Arasu et al, PODS’02] However, often a detailed, exact answer over streams is not interesting: Prefer summarized data (aggregates) Prefer to focus only on recent data Suffices to get the leading digits of aggregates correct => Keys to staying within the memory limitations

41 Lecture 15 03-04-03 41 Sliding Window Maintain the aggregate / statistic over a sliding window of the N most recent stream elements Motivation: Only the most recent data is important Position: 1 2 … 20 21 22 23 24 25 26 27 28 29 Stream: 0 1 … 1 0 1 0 0 1 1 0 1 0 N = 10 Number of 1’s = 5 N = 10 30 0 Number of 1’s = 4 [Datar, Gionis, Indyk, Motwani, SODA’02]

42 Lecture 15 03-04-03 42 New Stream Algorithms for Histograms Equi-Width Histograms (Quantiles) Most popular items, V-Opt Histograms Wavelets Data Mining Stream Clustering (e.g. k-medians) Decision Trees Frequency moments, Lp Norms of two streams Relational DB operators Join size estimation Papers in STOC, FOCS, SIGMOD, VLDB, etc Also Lower Bounds

43 Lecture 15 03-04-03 43 Data Stream Systems Introduction Research in Synopses for Data Streams (models, algorithms, lower bounds) Research in Data Stream Management Systems

44 Lecture 15 03-04-03 44 Traditional DB Management System User/Application Query Optimizer Query Processor Database Management System (DBMS) Query Result LoaderLoader QueryResult Adapted from slides ©Rajeev Motwani

45 Lecture 15 03-04-03 45 Data Stream Management System User/Application Stream Query Processor Scratch Space (Memory and/or Disk) Data Stream Management System (DSMS) Register Query Results Centralized Processing: I.e., Merged Streams Model Adapted from slides ©Rajeev Motwani

46 Lecture 15 03-04-03 46 Related Database Technology Triggers on Conventional Databases / Active Databases handling stream ordering/rate, scaling/generality for triggers Main-Memory Databases handling ordering/rate, better for read-only/query-intensive Publish/Subscribe Systems handling stream ordering, event-filtering only, dissemination focus Materialized Views handling stream ordering, no streaming output Sequence/Temporal/Timeseries Databases represents time/ordering in stored relations Realtime Databases transactions with deadlines Adapted from slides ©Rajeev Motwani

47 Lecture 15 03-04-03 47 STREAM Architecture (Stanford) Input streams Users issue continuous and ad-hoc queries Administrator monitors query execution and adjusts run-time parameters Applications register continuous queries Output streams  x  x Waiting Op Ready Op Running Op Synopses Query Plans Historical Storage Adapted from slides ©Rajeev Motwani

48 Lecture 15 03-04-03 48 Stream DB Projects Amazon/CougarAmazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring, dataflow HancockHancock (AT&T) – telecom streams Niagara (OGI/Wisconsin) – Internet XML databases OpenCQOpenCQ (Georgia) – triggers, incr. view maintenance Stream (Stanford) – general-purpose DSMS TapestryTapestry (Xerox) – pub/sub content-based filtering Telegraph (Berkeley) – adaptive engine for sensors TribecaTribeca (Bellcore) – network monitoring Adapted from slides ©Rajeev Motwani

49 Lecture 15 03-04-03 49 Summary: DBMS versus DSMS Persistent relations One-time queries Random access “Unbounded” disk store Only current state matters Passive repository Relatively low update rate No real-time services Assume precise data Access plan determined by optimizer, physical DB design Transient streams Continuous queries Sequential access Bounded main memory History/arrival-order is critical Active stores Possibly multi-GB arrival rate Real-time requirements Data stale/imprecise Unpredictable/variable data arrival and characteristics Adapted from slides ©Rajeev Motwani

50 Lecture 15 03-04-03 50 The Bigger Picture: When & Why Data streams?

51 Lecture 15 03-04-03 51 Important Scenarios 1.Static / offline preprocessing time query time synopsis size 2.Dynamic / online update time for new data query time synopsis size 3.Data Stream update time, query time synopsis size Full Data Set on Disks memory Full Data Set on Disks memory New data memory New data Synopses inside See all the data thus far, maintain synopsis, answer queries [G, Matias ‘98]

52 Lecture 15 03-04-03 52 The Bigger Picture Data streams? Single, Merged, or Distributed data streams? Continuous queries over distributed data streams? Adversarial inputs? Sliding windows? Applicability to IrisNet?

53 Lecture 15 03-04-03 53 Next Lecture Tuesday March 11 Adrian Perrig on Key distribution & Trust bootstrapping


Download ppt "Lecture 15 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 15 Sensor Databases & Data Stream Systems Phil Gibbons."

Similar presentations


Ads by Google