Presentation is loading. Please wait.

Presentation is loading. Please wait.

Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley.

Similar presentations


Presentation on theme: "Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley."— Presentation transcript:

1 Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley

2 Overview Why uncertain observation times matter Scenarios considered: 1.Each event is observed: – Efficient DP algorithm 2.Missing and false events: – Practical approximation algorithm 3. Multiple asynchronous observation streams

3

4 Motivation Two types of data streams – Automatically time-stamped data traces – Human annotations for temporal events Many essential facts cannot be recorded automatically Human-generated timestamps often wrong Assuming the correctness of timestamps can lead to nonsense results

5 Example: at 16.30, nurse enters “gave phenylephrine at 16.00” Data entry time Event timestamp

6 Example: at 16.30, nurse enters “gave phenylephrine at 16.00” Data entry time Actual event time

7 Ubiquity of uncertain observation times Nurse monitoring a patient in the ICU – Hundreds of events recorded by the nurse Usually recorded after event, sometimes before Manual recording of events – Science experiments Biology, chemistry, physics – Industrial plants Multiple observation traces – Various historians’ accounts of a period – Only one underlying truth

8 Sample trace generated from model Correct chronological ordering of time stamps Actual time of event ( a i ) Recording time of event ( d i ) Time  Nurse gives medicine at 10:23 a.m. Nurse records event at 11:00 a.m. Nurse records time of event as 10:30 a.m. Previous event’s time stamp (10:15 a.m.) Recorded time of event ( m i )

9 Dynamic Bayesian networks DBNs are discrete-time multivariate stochastic process models (include HMMs and KFs) DBNs facilitate modeling of complex systems with sensor noise etc. Large-scale physiological models pursued since the 1960s, but little attention paid to nature of real data

10 Simple DBN representation Y1Y1 Y2Y2 Y3Y3 Y7Y7 X1X1 X2X2 X3X3 X7X7 a1a1 a2a2 a3a3 m1m1 m2m2 m3m3 d1d1 d2d2 d3d3 Y4Y4 Y5Y5 Y6Y6 X4X4 X5X5 X6X6 Y8Y8 X8X8 falsetrue false 2 57 2 48 6 6 8

11 Objective To design a graphical model that allows for uncertainty in observation times Derive efficient inference algorithms – Naïve algorithm has O(M T ) complexity – Reduce to O(MT) Ordering constraints Dynamic programming

12 Key constraint assumption Person recording events gets the order right Valid association Invalid association For all i, j: m i > m j => a i > a j Time  Recorded time of event (m i ) Actual time of event (a i ) Time  Recorded time of event (m i ) Actual time of event (a i )

13 Pre-computation step Likelihood of the data segment between the current event time stamp (a k ) and the next hypothesized event time stamp (a k+1 ) Pre-compute for all k, and all possible values of a k and a k+1

14 Modified Baum-Welch algorithm

15 Complexity Modified time complexity O(MS 2 T) – M: maximum size of the time window of uncertainty – S: # states in system – T: number of time steps Space complexity – O(KM 2 ) – storing – O(KM) – storing α, β and γ

16 Simulation results – Increased likelihood of evidence Window of uncertainty

17 Simulation results – General accuracy of inference

18 Simulation results – Computation time vs size of uncertainty window

19 Unreported events, false reports Not all events are reported – Unobserved – Negligence Not all reports are true – Double entry of a single data point – Misinterpretation of information – Intended actions reported but not carried out

20 Missing and false reports a1a1 a2a2 a4a4 m1m1 m2m2 m3m3 θ1θ1 θ2θ2 θ4θ4 Φ1Φ1 Φ2Φ2 Φ3Φ3 a3a3 θ3θ3 Actual time of event (a i ) Recorded time of event (m i ) Event i reported? (θ i ) Index of event corresponding to report j (φ i )

21 Missing and false reports a1a1 a2a2 a4a4 m1m1 m2m2 m3m3 θ1θ1 θ2θ2 θ4θ4 Φ1Φ1 Φ2Φ2 Φ3Φ3 a3a3 θ3θ3 1 1 0 0 1040 3020 041 12 39 45 Actual time of event (a i ) Recorded time of event (m i ) Event i reported? (θ i ) Index of event corresponding to report j (φ i )

22 Modified DP and complexity The previous algorithm was compact because of the one-to-one correspondence between events and reports – Now have to consider all possibilities Unless there are constraints (more on this later) Chronological mapping of events’ time stamps still holds – This again leads to an efficient dynamic program

23 Computational complexity In the general case, uncertainty windows are no longer limited, since event i can be associated with any report j O(IJT 2 ) – I is the number of hypothesized events – J is the number of reports – T is the length of the temporal sequence

24 Practical assumptions – I Data entries are made in blocks – All reports in a given block (e.g., the night shift) must be for events that occurred (really or otherwise) in that block – Computational complexity is linear in T if blocks are of constant size

25 Practical assumptions – I Data entries are made in blocks – Record entered in the afternoon shift cannot correspond to an event in the morning shift Actual time (a i ) Time  Recorded time (m i ) Morning ShiftAfternoon ShiftNight Shift Computational complexity is linear if time blocks are of constant size

26 Practical assumptions – II When unobserved events and false reports are both rare events – We can perform approximate inference by NOT considering all possible a i  m j associations – The posterior distribution is highly concentrated along the “skewed diagonal” corresponding to a small number of errors – Assuming a bounded number of errors gives time complexity proportional to T

27 Simulation results – Posterior is peaked around the skewed diagonal

28 Simulation results – Hypothesizing more events leads to better recall

29 Effect of varying c

30 Multiple observation sequences Formulation – Several “sources” reporting on the same events – Key assumption Individual report sequences are independent given the actual truth (the X chain)

31 aiai a i+1 aIaI θiθi θ i+1 θIθI Φ j (1) Φ J (1) m j+1 (1) m J (1) Φ j+1 (1) m j (1) m j (R) m j+1 (R) m J (R) Φ j (R) Φ j+1 (R) Φ J (R) Latent trajectory Evidence trajectory 1 Evidence trajectory R

32 Multiple observation sequences Formulation – Several “sources” reporting on the same events – Key assumption Individual report sequences are independent given the actual truth (the X chain) Inference – Similar DP algorithms apply, given the assumptions of ordering constraints, blocks, etc. – Complexity increases linearly with the number of report sequences

33 Conclusions Handling uncertainty in observation times is critical for correct modeling and inference Assumptions about qualitative accuracy (e.g., order of events) can be very helpful Given such assumptions, the computational complexity of inference remains unchanged (modulo some constant factors) while handling the following cases – Noisy observation times – Missing and false reports – Multiple report sequences

34 QUESTIONS? Thank You!


Download ppt "Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley."

Similar presentations


Ads by Google