Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Querying the Physical World ------Cornell University Event Detection Services Using Data Service Middleware in Distributed Sensor Networks ------University.

Similar presentations


Presentation on theme: "1 Querying the Physical World ------Cornell University Event Detection Services Using Data Service Middleware in Distributed Sensor Networks ------University."— Presentation transcript:

1 1 Querying the Physical World ------Cornell University Event Detection Services Using Data Service Middleware in Distributed Sensor Networks ------University of Virginia Presented By Gary Zhou @ UVA CS 862 Presentation

2 2 Comparison between these two papers Query the physical world Event Detection Service  No avi value for each data, so not really real-time based.  There is avi value for each data, so really Real-time based  Special interesting point: represent device function  Special interesting point: provide event detection service  Concentrate on individual mote.  Group-based robust coordination  Provide database-like abstraction to applications

3 3 Outline --- Querying the Physical World Device Networks & Their Query Processing  Description of Device Networks  Three kinds of queries  Two approaches Device Database System  Device & Function  User representation  Internal representation  Queries Query Processing over Device Database System  Performance Metrics  Distributed Query Execution Plans  Experiments Discussions

4 4 Outline --- Event Detection Service Motivation Data services in sensor networks Data Service Middleware (DSWare) Pay more attention to Event Detection Service Experiments and performance Discussions

5 5 Device Networks & Their Query Processing Description of Device Network The widespread deployment of sensors, actuators and mobile devices is transforming the physical world into a computing platform. Emerging networking techniques ensure that devices are interconnected and accessible from local- or wide-area networks. Using this new computing platform, users interact with portions of the physical world.

6 6 Three kinds of Queries Historical queries These are typically aggregate queries over historical data obtained from the device network. An example --- For each rainfall sensor in 1800 JPA, display the average level of rainfall for 1999. Snapshot queries These queries concern the device network at a given point in time. An example --- Retrieve the current rainfall level for all sensors in 1800 JPA. Long-running queries These queries concern the device network over a time interval. For the next 5 hours, retrieve every 30 seconds the rainfall level for all sensors in 1800 JPA.

7 7 Two Approaches Device database system Definition --- A database system that enables distributed query processing over a device network. The warehousing approach Definition --- In this approach, data are extracted from the devices in a predefined way and stored in a centralized database system that is responsible for query processing.

8 8 Two Approaches --- warehousing Advantages of warehousing approach Disadvantages of warehousing approach It uses valuable resources to transfer large amount of raw data from devices to the database server. It disassociates access to device from the query workload. It is well suited for aggregated queries asked for historical data.

9 9 Two Approaches --- Device database system Device database system  Device & Function  User representation  Internal representation  Queries

10 10 Device & Function Device Each device is a mini-server that supports a set of functions and can process portions of the queries directly at the device. example, a function that detects an abnormal rainfall level. Function A function either a) Acquires, stores and processes data or b) Triggers an action in the physical world Synchronous function  It returns result immediately, on demand.  It is used to monitor continuous phenomena, for example, a function that returns the rainfall level. Asynchronous function  It returns result after an arbitrary period of time.  It is used to monitor threshold events, for example, a function that detects an abnormal rainfall level.

11 11 User representation Devices are represented as ADTs Abstract Data Type (ADT) objects ADT objects are objects that are single attribute values encapsulating a collection of related data. ADT objects provide controlled access to encapsulated data through a well-defined interface. An example: RFSensors (Sensor,X,Y) provides Sensor.getRainfallLevel()

12 12 Internal representation Device functions are represented as virtual relations Virtual relation It is a tabular representation of a function. A record in it contains the input arguments and the output argument of the function it is associated with. Arguments of Device Function a1a1a1a1…… aMaMaMaM Attributes of Virtual Relation Device ADT ID Device ADT ID a1a1a1a1…… aMaMaMaM Output value Time stamp Properties of Virtual relation It is appended only It is naturally partitioned across all devices represented by the same device ADT

13 13 Queries Historical queries Snapshot queries They are naturally formulated as declarative queries in SQL An example of long-running query SELECT R.Sensor.getRainfallLevel() FROM RFSensors R WHERE R.Sensor.getRainfallLevel() > 50 AND $every(30) The function $every(30) specifies that a new record is inserted every 30 seconds into the append-only virtual relation corresponding to the function RFSensor.getRainfallLevel().

14 14 Query Processing over Device Database System Performance Metrics Traditional performance metrics  Throughput --- average number of queries processed per unit of time New performance metrics  Resource Usage --- The total amount of energy consumed by the devices when executing a query.  Response time --- time needed by the system to produce all answer records to a query.  Reaction Time --- The interval between the time a function, called on devices, returns the value and the time the corresponding answer is produced on the front-end.

15 15 Distributed Query Execution Plans Query --- Retrieve every 30 seconds the rainfall level if it is greater than 50 mm. SELECT VR.value FROM VRFSensorsGetRainfallLevel VR, RFSensors R WHERE VR.Sensor = R.Sensor AND VR.value > 50 AND $every(30)

16 16 Plan T  Data extracted from the devices are materialized in the relation VR that is located on the front-end.  Both R and VR are in the front-end. And the join is executed on the front- end  Join relation R and relation VR (using join condition VR.Sensor = R.Sensor AND VR.value > 50)

17 17 Plan A  It is a simple tree where R is joined on the front-end with relation VR partitioned across a set of devices.  The front-end asked each device to measure rainfall level and to transfer the resulting virtual records back to the front-end.  Disadvantages --- All devices with rainfall sensors transmit data to the front-end while the query only concerns the sensors which measure a rainfall level greater than 50.  Each virtual record arriving on the front-end is then joined with relation R.

18 18 Plan B  Define a semi-join between R and the partitions of VR located on the devices. The semi-join projects out the joining attribute from R (here the device ID Sensor) and sends it to all devices.  On the devices, whenever the rainfall level is measured, a virtual record is generated and joined with the portion of relation R sent by the front-end (using joining condition R.Sensor = VR.Sensor and VR.value > 50)  If the joining condition is verified, the virtual record is sent back to the front- end to get joined with complete records from relation R.

19 19 Plan C  It only pushes the selection (VR.value > 50) onto the device. Only records that verify the condition are sent back to the front-end where they are joined with relation R.  Compared to Plan B, there is no subset of relation R transmitted to the devices.

20 20 Resource usage for sensors located outside a flood area With Plan A, data is sent back to the front-end whenever it is generate. With Plan B, a semi-join is pushed to the device. The condition on the rainfall level is checked on the device and no data is sent back because of being outside of the flood. Plan B pays the initial cost of transferring a fragment of relation R to the devices. This initial cost is amortized (compared to Plan A) during the lifespan of the long-running query. With Plan C, a selection is pushed to the device. The condition on the rainfall level is checked on the device and also no data is sent back because of locating outside of the flood.

21 21 Resource usage for sensors located inside a flood area With all plans, data is always sent back to the front-end. The initial cost of Plan B is here never amortized. So line B will rise rapidly with time increasing. Question: Why Plan C and Plan A have almost similar curves? Because the cost of performing a selection is low compared to the cost of sending data.

22 22 Conclusion of Plans  Pushing a selection as in Plan C is the optimal. This is intuitive since the query filters out uninteresting events generated on the devices.  Pushing the selection allows the device database system to trade efficiently increased processing on the devices for reduced communication.

23 23 I love the idea of using virtual relations to represent device functions The complete query semantics over a Device Database are not given here. No avi value for each data, so not really real-time based. Individual nodes are not important, and a mote’s sensor may get damaged and repots wrong value. So group-based coordinate should be introduced. Discussions

24 24 Event Detection Service

25 25 Motivation sensor networks are data-centric and real-time based – Abstraction of real-time data semantics needed – Abstraction of real-time data semantics needed Individual nodes in sensor networks are unreliable -- Group-based robust coordination needed Detection of some events relies on more than one type of sensor data -- The relationship can help to increase the reliability of data decisions

26 26 Data Services in sensor networks Queries (location, frequency, duration) Data/Event dissemination Data Aggregation Data-centric Storage/Caching Event Detection Data Security and Access Authorization

27 27 Data Service Middleware (DSWare) Data Storage Map the key to a logical node Map a logical node to multiple physical nodes Caching Spread copies along the routing path Compare? Data Storage Static copies & provide reliability Caching Variable copies & improve performance Sensor nodes Real-time Scheduling Subscription Application DSWare Database-like abstraction Event Detection Group ManagementAggregation Data StorageCachingAuthorization Services in Data Service Middleware

28 28 Problems with current event detection schemes An external node collects reports of atomic events and determines whether the compound event occurs Explosion Atomic Event Reports Determine the occurrence of compound events  reduce possible in-network processing and increase unnecessary concentrated traffic around the decision node  Increase detection delay (unacceptable for some time-critical applications)

29 29 Event Detection Service in DSWare Event: application-interested activity in the environment that can be monitored or detected Explosion Detected in the area: High Temperature, light intensity change, acoustic changes Hierarchy of events  Atomic event: detected through a single sensor’s observation e.g. High Temperature, light intensity change, acoustic change  Compound event: consists of a set of atomic events detected based on the detection of atomic events that a compound event consists of e.g. Explosion

30 30 Event Detection Scheme in DSWare Confidence  Every compound event detection report has a confidence value, which indicates the reliability of the report  Confidence function is designed based on data semantics Related importance of different atomic sub-events Temporary continuity of events Statistical models Similarity among adjacent regions Waiting Time Window  The time that an aggregation node waits for the arrivals of all possible atomic event reports  When TW timeouts, report a compound event if the confidence value reaches the minimum confidence requirements of this event  Avoid endless waiting for messages loss  Enable event detection based on partial information collected

31 31 A Simple Example: Explosion (E) Sub-events: high temperature (T), special light (L), acoustic changes (A) Group Leader T f=0.6h A f=0.9h Time window A f=1.2h L Lost L f=0.3h Shift time window time f=0.9h Report E f=0.9h T No reports f=0.3h f=1.2h Report E L f=0.3h Confidence function: f = [0.6 * BOOL(T) + 0.3 * BOOL(L) + 0.3 * BOOL(A)] * h (h: history factor, increases if the explosion event has been detected in previous waiting time window. Assume 1≤h≤2) Minimum Confidence: 0.8

32 32 Some other issues in event detection Temporal resolution –Some events last much longer than the sensing interval of a sensor. So probably some applications will report a single event repetitively, which is unnecessary. Spatial resolution –If the size of a detection group is too small compared to the event, there might be several groups in this event’s coverage that will report the same event.

33 33 Performance in Reduction of Communication Base line: –Only one report of an environment property is generated from a group during each sensing interval. –Send all reports to an outside node and the entire analysis will be done there. DSWare has less communication.

34 34 Performance in Differentiating Events and Event-like Factors How to differentiate repetition report of event from event-like factor? How about the performance with different time window size and different minimum confidence value?

35 35 Discussions The idea of event detection service is well developed and completely discussed. In DSWare, data is replicated in multiple physical nodes that can be mapped to a single logical node. So consistency among these nodes is a key issue. In this paper, “weak consistency” is mentioned. But what’s the definition of “weak consistency” in sensor network? Since multiple physical nodes are used to map to a single logical node, why data caching is needed? What’s the different purposes of introducing both of them. It is mentioned that application can specify the actual scheduling schema in the sensor networks based on the most important concerns. But is it a good way for application to do that? It doesn’t seem a simple work.

36 36 Discussions --- (cont.) What is the position of real-time scheduling in the system? How to provide real-time? Two questions about Fig 5.  How to differentiate repetition report of event from event-like factor?  How about the performance, with different time window size and different minimum confidence value? A little typing mistake:  In the last sentence before 5.1, “an explosion event will be reported if the Confidence_E is not less than 0.9” should be “an explosion event will be reported if the Confidence_E is no less than 0.9”


Download ppt "1 Querying the Physical World ------Cornell University Event Detection Services Using Data Service Middleware in Distributed Sensor Networks ------University."

Similar presentations


Ads by Google