Download presentation
Presentation is loading. Please wait.
Published byEdgar Stock Modified over 9 years ago
1
LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington
2
What is a Lahar? Lahar -- SIGMOD 2008 -- Christopher Re2 This is a Lahar May 18, 1980 ~ 8:27am… a few minutes later It’s a massive, fast stream of dirt(y data) Our system, Lahar, processes queries on massive, dirty streams of data
3
Event Queries Lahar -- SIGMOD 2008 -- Christopher Re 3 CB A D E Motivating App: RFID Event queries as Cayuga, Sase and Snoop Complex sequences using projections, predicates,… Joe entered office 422 at t=8 Query: “Alert when Joe enters 422” i.e. Joe outside 422, inside 422
4
Challenges: Tracking Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re4 6 th Floor in CS building Blue ring is Joe’s Location Antennas
5
6 th Floor in CS building Challenges: Tracking Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re5 Blue ring is Joe’s Location Antennas Two Problems: 1.Missed Readings 2.Granularity Mismatch Propose: infer location, keep probs & query with Lahar Model Based View [Deshpande et al] of an HMM Lahar retains probabilities, achieves higher quality (P/R) and is still efficient.
6
Outline Lahar -- SIGMOD 2008 -- Christopher Re6 RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments
7
Tracking Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re7 Blue ring is ground truth Antennas 6 th Floor in CS building
8
Probabilities via particle filter Lahar -- SIGMOD 2008 -- Christopher Re8 Each orange particle is a guess of Joe’s location Blue ring is ground truth Antennas Particles guess many locations per timestep, so data are uncertain 6 th Floor in CS building
9
TagtLocP Joe74220.4 Hall30.4 Hall40.2 Joe84220.6 Hall30.2 Hall40.2 Sue7…… From particles to a probabilistic stream Lahar -- SIGMOD 2008 -- Christopher Re9 At(tag,loc) Query Particle Filter output via At – a model based view
10
(0.4+0.2) * 0.6 = 0.36 TagtLocP Joe74220.4 Hall30.4 Hall40.2 Joe84220.6 Hall30.2 Hall40.2 Sue7…… Semantics of the Model Lahar -- SIGMOD 2008 -- Christopher Re10 At(tag,loc) TagtLoc Joe7Hall4 Joe8422 Sue7… Prob = 0.2 * 0.6 * … “Joe enters 422” @ t=8 A query q returns the probability that q is true at each time t possible stream (worlds) Probability outside 422 (in Hall3,Hall4)
11
Outline Lahar -- SIGMOD 2008 -- Christopher Re11 RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments
12
Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re12 Alert when Joe is in hallway 4 and later in office 422 Inspired by Cayuga [Demers et al 2006, White et al 2007]
13
Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re13 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007]
14
Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re14 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007] Alert when Joe is in hallway 4, and immediately in office 422
15
Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re15 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007] Alert when Joe is in hallway 4, and immediately in office 422 Joe in Hall4Joe in 422 Challenge with probabilities: Naïve approach is exponential; unavoidable (#P)
16
Regular Queries (Efficient, streamable) Alert when Joe enters 422 Extended Regular (Efficient, streamable) Alert when anyone enters 422 A hierarchy of Lahar queries Lahar -- SIGMOD 2008 -- Christopher Re16
17
A hierarchy of Lahar queries Lahar -- SIGMOD 2008 -- Christopher Re17 Regular Queries (Efficient, streamable) Alert when Joe enters 422 Extended Regular (Efficient, streamable) Alert when anyone enters 422 Safe (Efficient, but not streamable) Unsafe (Inefficient)
18
Outline Lahar -- SIGMOD 2008 -- Christopher Re18 RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments
19
Review: A non-probabilistic example Lahar -- SIGMOD 2008 -- Christopher Re19 Alert me when Joe enters 422 TagTLoc Joe7Hall 4 Joe8422 TagTLoc Joe7Hall 4 Joe8423 Accept at t = 8 {} {1} {2} {} {1} {} Final Joe in Hall4Joe in 422 1 2
20
… now with probabilities Lahar -- SIGMOD 2008 -- Christopher Re Final Joe in Hall4Joe in 422 1 2 Accept t=8 with p = 0.3 Alert me when Joe enters 422 {} 1.0 {} 0.5, {1} 0.5 {} 0.65, {1} 0.05, {2} 0.3 Distribution on States TagTLocP Joe7Hall40.5 Joe84230.3 4220.6
21
Lies in the preceding slides… (technical details) Lahar -- SIGMOD 2008 -- Christopher Re21 Richer predication: “Alert when Joe enters any office” Translate query and input into an alphabet Final Joe in Hall4Joe in 422 1 2 Key Technical Detail: Alphabet is small in data Streamable See paper for compilation
22
Extension to Extended regular Lahar -- SIGMOD 2008 -- Christopher Re22 “Alert when anyone enters 422”
23
Extension to Extended regular Lahar -- SIGMOD 2008 -- Christopher Re23 Algorithm: (Obs1) suggests run automaton for each person (Obs2) suggests multiply to get prob any is true Space = O(# persons), not # timesteps: can stream “Alert when anyone enters 422” (Obs 1) Each query is regular(Obs 2) disjoint sets of events Hence, probabilistically independent
24
Summary of Contributions Regular Queries (Efficient, streamable) Compiled to an automaton,streaming, O(1) space Extended regular (Efficient, streamable) Streaming with O(m) space, i.e. # of persons. See paper for Markovian correlations, more sophisticated predication, complete compilation and static analysis algorithms Safe (Efficient, but not streamable) Unsafe (Inefficient, most #P-hard)
25
Outline Lahar -- SIGMOD 2008 -- Christopher Re25 RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments
26
Experimental Setup Lahar -- SIGMOD 2008 -- Christopher Re26 Quality: How is P/R affected by keeping probs? 52 objects, 352 locations, 10k sq. ft. 2x30min trace with 10 min break in between Participants marked down true locations
27
Experimental Setup Lahar -- SIGMOD 2008 -- Christopher Re27 Quality: How is P/R affected by keeping probs? 52 objects, 352 locations, 10k sq. ft. 2x30min trace with 10 min break in between Participants marked down true locations “Alert when anyone enters a coffee room” Baseline: Most Likely Estimate (MLE) Each timestep/Each person: most likely location
28
Quality: Realtime – Improve over MLE? Lahar -- SIGMOD 2008 -- Christopher Re 28 Declare an event “true”, if its Pr > threshold Vary threshold Precision Recall F1 10% improvement in F1
29
Performance: Is the cost too high? Lahar -- SIGMOD 2008 -- Christopher Re29 Synthetic Data – Same query
30
Related Work Lahar -- SIGMOD 2008 -- Christopher Re30 Event Queries – Deterministic Cayuga, SASE, SnoopIB Model-Based Views BBQ, recently, Kanagal et al ICDE 08 Probabilistic Databases Mystiq, Trio, MayBMS, Maryland, Purdue,MCDB Particle Filters on HMMs Doucet, Godsill
31
Conclusion Lahar -- SIGMOD 2008 -- Christopher Re31 Showed Lahar Processed output of several inference tasks (HMMs) Applies more generally than just RFID Quality (F1) gains by keeping probability Performance usable in real-time Lots of concurrent tags No indexing!
32
Lahar -- SIGMOD 2008 -- Christopher Re32
33
Overview of Regular Query Algorithm Lahar -- SIGMOD 2008 -- Christopher Re33 1. Compile an event query q 1. Automaton (A) over a language L 2. Mapping (M) events to subsets of L 2. Runtime – Input is set of events E 1. Map E into subsets of L via M 2. Maintain set of possible states of A Deterministic Probabilistic stays same distribution Size of distribution depends only on the query, q. NB: example to follow For details, see paper
34
Why are ER queries hard? Lahar -- SIGMOD 2008 -- Christopher Re34 Regular Queries ~ Regular Expressions Mapping is non-trivial Inspired by Cayuga [Demers et al. 06] Queries have #P-combined complexity Encode mDNF as regular expression Intuition: n-sized automaton leads to Extended regular ~ 1 NFA per/person k persons implies O(k)-size automaton Exponential cost When ER, can avoid blowup
35
Regular and Extended Regular Lahar -- SIGMOD 2008 -- Christopher Re35 Query is regular if no variable is shared between subgoals Query is extended regular if any variable shared by two subgoals, is shared by all subgoals p is shared between subgoals
36
Correlations Lahar -- SIGMOD 2008 -- Christopher Re36
37
Sequencing by example Lahar -- SIGMOD 2008 -- Christopher Re37 Sequencing is parameterized [Cayuga] Time Semicolon means “the next event among those that match next goal” Semicolon is not “after”
38
Compilation by example Lahar -- SIGMOD 2008 -- Christopher Re38 Each goal “corresponds” to two letters: move (m) – the query should advance accept (a) – the next subgoal accepts Any other maps to empty set Final Does not contain Does contain
39
Subtle example.. Lahar -- SIGMOD 2008 -- Christopher Re39 What about: Any other maps to empty set Final Does not contain Does contain
40
CUT II Lahar -- SIGMOD 2008 -- Christopher Re40
41
Motivating Apps Lahar -- SIGMOD 2008 -- Christopher Re41 RFID apps Diary and Active Calendar Application. Alert if I go to a database meeting. Supply chain Alert if Mach 3 razors are being stolen Many independent HMMs Elder care [Intel/UW] Alert if elder takes their medicine with water Activity Recognition Financial applications on predictive HMM Alert if head-and-shoulders market
42
Compile Select and Filter Lahar -- SIGMOD 2008 -- Christopher Re42 Intuition: goal maps to two letters: match (m) : matches filter accept (a) : accepted by select Final Does not contain Does contain language and automaton are the same for both queries
43
Wrinkle in the language: Filter v. Selection Lahar -- SIGMOD 2008 -- Christopher Re43 “Alert next time Joe is in 502 after he is in 501” Time Yes No “Alert if the next place Joe is in after 501 is 502” At
44
Recap of Algorithms Lahar -- SIGMOD 2008 -- Christopher Re44 Regular Queries Compiled them to an NFA, then used image Data complexity O(1) Extended regular Several regulars multiplied together Depends on number of distinct people in the data, not number of time steps.
45
Lahar -- SIGMOD 2008 -- Christopher Re45 Text1 Eculid uclid
46
Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re46 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Alert when Joe is in hallway 4, and immediately in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007] Challenge with probabilities: Naïve approach is exponential; unavoidable (#P)
47
Quality: Archived – Improve over Viterbi? Lahar -- SIGMOD 2008 -- Christopher Re 47 Smoothing v. Viterbi (MAP) Lahar tracks of Markovian Correlations Viterbi leverages correlations for MAP estimate PrecisionRecallF1 Approx ~30% gain in F1
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.