Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by:

Similar presentations


Presentation on theme: "Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by:"— Presentation transcript:

1 Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by: Hamid Haidarian Shahri

2 Where Are We? Look at the Signs!

3 Looking at Signs – Before Jumping In S. Chaudhuri, U. Dayal, "An Overview of Data Warehousing and OLAP Technology," SIGMOD Record, 1997.  800+ citations DW and information integration “Data cleaning” term publicized  Identified its importance in integration Extensive research followed

4 VLDB 2001 Session R12: DATA QUALITY & CLEANING Declarative data cleaning: language, model, and algorithms Helena Galhardas (INRIA Rocquencourt), Daniela Florescu (Propel), Dennis Shasha (NYU), Eric Simon, and Cristian- Augustin Saita (INRIA Rocquencourt) Potter's wheel: an interactive data cleaning system Vijayshankar Raman and Joseph M. Hellerstein (University of California at Berkeley) Update propagation strategies for improving the quality of data on the Web Alexandros Labrinidis and Nick Roussopoulos (University of Maryland)

5 Data Cleaning Previous Work - 2006 Hamid Haidarian Shahri, S.H. Shahri, “Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework," IEEE Intelligent Systems, Vol. 21, No. 5, 2006.

6 Putting Things into Context Data cleaning required after integration  No unified standard across sources  NOW: sensor/hardware errors inevitable; research opportunity Data modeling (Amol Deshpande)  An important use case is cleaning

7 VLDB 2006 – Three weeks ago Research Session 5: Sensor Data (dedicated to cleaning!) Title: Adaptive Cleaning for RFID Data Streams  Authors: Shawn R. Jeffery, Minos Garofalakis, Michael J. Franklin Title: A Deferred Cleansing Method for RFID Data Analytics  Authors: Jun Rao, Sangeeta Doraiswamy, Hetal Thakkar, Latha S. Colby Title: Online Outlier Detection in Sensor Data Using Non- Parametric Models  Authors: Sharmila Subramaniam, Themis Palpana, Dimitris Papadopoulos, Vana Kalogeraki, Dimitrios Gunopulos

8 RFID: Radio Frequency IDentification

9 RFID data is dirty A simple experiment: 2 RFID-enabled shelves 10 static tags 5 mobile tags

10 RFID Data Cleaning Time Raw readings Smoothed output RFID data has many dropped readings Typically, use a smoothing filter to interpolate SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id But, how to set the size of the window? But, how to set the size of the window? Smoothing Filter

11 Window Size for RFID Smoothing Fido movingFido resting Small window Reality Raw readings Large window  Need to balance completeness vs. capturing tag movement

12 Truly Declarative Smoothing Problem: window size non-declarative  Application wants a clean stream of data  Window size is how to get it Solution: adapt the window size in response to data

13 Itinerary Introduction: RFID data cleaning A statistical sampling perspective SMURF  Per-tag cleaning  Multi-tag cleaning Ongoing work Conclusions

14 A Statistical Sampling Perspective Key Insight: RFID data  random sample of present tags Map RFID smoothing to a sampling experiment

15 RFID’s Gory Details EpochTagIDReadRate 01.9 02.6 03.3 Tag 1 Tag 2 Tag 3 Tag 4 Antenna & reader Tags E1E2E3E4E5E6E7E8E9E0 Read Cycle (Epoch) (For Alien readers) Tag List

16 RFID Smoothing to Sampling RFIDSampling Read cycle (epoch)Sample trial ReadingSingle sample Smoothing windowRepeated trials Read rateProbability of inclusion (p i )  Now use sampling theory to drive adaptation!

17 SMURF Statistical Smoothing for Unreliable RFID Data Adapts window based on statistical properties Mechanisms for: Per-tag and multi-tag cleaning

18 Per-Tag Smoothing: Model and Background Use a binomial sampling model Time (epochs) pipi 1 0 Smoothing Window w i Bernoulli trials p i avg SiSi (Read rate of tag i) E1E2E3E4E5E6E7E8E9E0

19 Per-Tag Smoothing: Completeness If the tag is there, read it with high probability  Want a large window pipi 1 0 Reading with a low p i Expand the window Time (epochs) E1E2E3E4E5E6E7E8E9E0

20 Per-Tag Smoothing: Completeness Expected epochs needed to read With probability 1-  Desired window size for tag i

21 Per-Tag Smoothing: Transitions Detect transitions as statistically significant changes in the data pipi 1 0 Statistically significant difference Flag a transition and shrink the window The tag has likely left by this point Time (epochs) E1E2E3E4E5E6E7E8E9E0

22 Per-Tag Smoothing: Transitions # expected readings Is the difference “statistically significant”? # observed readings Statistically significantStatistically significant

23 SMURF in Action Fido movingFido resting SMURF  Experiments with real and simulated data show similar results

24 Multi-tag Cleaning Some applications only need aggregates  E.g., count of items on each shelf  Don’t need to track each tag! Use statistical mechanisms for both:  Aggregate computation  Window adaptation

25 Aggregate Computation  –estimators (Horvitz-Thompson) Count: P[tag i seen in a window of size w]:  Use small windows to capture movement  Use the estimator to compensate for lost readings

26 Window Adaptation Upper bound window similar to per-tag “Transition” based on variance within subwindows Count NwNw Nw’Nw’ Time (epochs) E1E2E3E4E5E6E7E8E9E0

27 Multi-tag Scenario

28 Ongoing Work: Spatial Smoothing With multiple readers, more complicated Reinforcement  A? B? A U B? A B? Arbitration  A? C?  All are addressed by statistical framework! U A B C D Two rooms, two readers per room

29 Beyond RFID  -estimator for other aggregates  Use SMURF for sensor networks Use SMURF in general streaming systems (e.g., TelegraphCQ)  Remove RANGE clause from CQL Other sensor data Other streaming data

30 Related Work Commercial RFID middleware  Smoothing filters: need to set smoothing window RFID-related work  Rao et al., StreamClean: complementary  Intel Seattle, HiFi, ESP: static window size BBQ, MauveDB  Heavyweight, model-based  SMURF is non-parametric, sampling-based Statistical filters (digital signal processing & DB)  Non-linear digital filters inspired SMURF design

31 Conclusions Current smoothing filters not adequate Not declarative! SMURF: Declarative smoothing filter Uses statistical sampling to adapt window size

32 Thanks! Questions?


Download ppt "Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by:"

Similar presentations


Ads by Google