Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.

Similar presentations


Presentation on theme: "Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel."— Presentation transcript:

1 Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel

2 Event Detection Problems A process like e.g., traffic flow, crowd formation, or financial electronic transactions is unfolding in time. We can monitor and observe the flow frequencies at many fixed time points. Typically, there are many causes influencing changes in these frequencies.

3 Causes for Change Possible causes for change include: a) changes due to noise; i.e., those best modeled by e.g., a Gaussian error distribution. b) periodic changes; i.e., those expected to happen over periodic intervals. c) changes not due to either of the above: these are usually the changes we would like to detect.

4 Examples: Examples include: 1) Detecting ‘Events’, which are not pre- planned, involving large numbers of people at a particular location. 2) Detecting ‘Fraudulent transactions’. We observe a variety of electronic transactions at many time intervals. We would like to detect when the number of transactions is significantly different from what is expected.

5 Model for Changes Due to Noise or Periodic Changes or other We model changes in ‘flow frequency’ due to all possible known causes. This is done using latent Poisson processes. The frequency count N(t) at time t is observed. N 0 (t), N E (t) are independent latent Poisson processes. N 0 (t) denotes the frequency due to periodic and noise changes at time ‘t’. We use the terminology, λ(t) for the average rate for such changes. We write N 0 (t)~P(N,λ(t)) in this case. N E (t) denotes the frequency due to causes different from periodic and noise changes. It has a rate function γ(t). We write, N E (t)~P(N,γ(t)) in this case. The rate function λ(t) is regressed on a parametric function of the periodic effects as follows:

6 The Process N 0 (t): We focus on the first example given above and consider the problem of modeling the frequencies of people entering a building with the eventual purpose of modeling special ‘events’ connected with these frequencies. We let a) ‘d’ stand for day, b) ‘hh’ for half hour time interval. c) ‘b’ for base

7 Rate Function Due to Periodic and Noise Changes. The rate function due to periodic and noise changes is:

8 Rate Function Explained This makes sense because we are thinking of the fact that, for a time ‘t’ in day ‘d’ and half-hour period ‘h’, we have (by Bayes rule) In the sequel, we assume time ‘t’ has been broken up into half-hour periods without re-indexing.

9 Example: Work week Say you worked 21 hours on average per week. So your base work rate (per week) is λ b = 21. Therefore your daily base work rate is 3. Your average work rate for Sunday relative to this base is λ Sunday =(total Sunday rate)/3. The sum of your work rates for Sunday, …Saturday is λ Sunday +…+ λ Saturday =7

10 Modeling Occasional changes in event flow and Noise Where does the noise come in? How do we model occasional changes in the periodic rate parameters? The missing piece is (dramatic pause) ??????!!!!!!!!!

11 Priors Come to the Rescue Priors serve the purpose of modeling noise and occasional changes in the values of parameters. Thus spake the prior The base parameter is given a gamma prior, λ base ~ π(λ)=λ α-1 δ α exp(-λδ)/Γ(α) By flexibly assigning values to the hyperparameter α,δ we can biuld a distribution which properly characterizes the base rate.

12 Interpretation The λ day ’s, being conditional rates, satisfy, ∑λ day = ∑[(average day ‘i’ totals)/λ base ]=7 Similarly, summing over periods, ∑λ j’th period during day i = ∑[(average j’th period frequency in day i)/λ day i ]=D where ‘D’ stands for the number of half hour intervals in a day.

13 A Simple example illustrating the issue. What do these mean. Assume, for purposes of simplification, that there are a total of 2 days in a ‘week’; Sunday and Monday. Daily rates are measured in events per day. The base rate is the average rate for sundays and mondays combined. A) The Sunday and Monday relative rates add up to 2. B) If we observe 10 people (total) on sunday and 30 on Monday, over a total of 10 ‘weeks’.

14 (continued) C) maximum likelihood dictates estimating the base rate (per week) as (40/10)=4 people per week (or 2 people per day), sundays relative rate is (10 (people)/[2*10 (weeks)])=.5 and Mondays relative rate is 1.5. D) But, this is incorrect because, (i) it turns out that one week out of 10, the conditional Monday rate shoots up to 1.90, while the Sunday rate decreases to 0.1. (ii) it turns out that usually, the conditional Sunday rate is 1 rather than.5.

15 The Bayesian formulation wins out: We can biuld a model with this new information by assuming a beta prior for half the Monday,Sunday relative rates – (.5)*λ sunday ~ [λ (.66-1) (1-λ) (.66-1) /β(.66,.66) This prior has the required properties that the Sunday rate dips down to.10 about 10 percent of the time, but averages 1 over the entire interval.

16 The Failure of classical theory The MLE of λ sunday is The Bayes estimator of λ sunday is But even more importantly, the posterior distribution of the parameter provides information useful to all other inference and prediction in the problem. Medicare for classical statistics?

17 illustration Posterior distribution for the twice the Sunday frequency rate. 0.5.52 1

18 Actual Priors Used For our example, we have seven rather than 2 days in a week. We use scaled Dirichlet priors (extensions of beta priors) for this: Smaller alpha’s indicate smaller apriori relative frequencies. Smaller sum of alpha’s indicate greater relative frequency variance for the p’s. This provides a flexible way to model the daily rates.

19 Events: The Process N E Events signify times during which there are higher frequencies which are not due to periodic or noise causes. We can model this by assuming z(t)=1 for such events and z(t)=0 if not. P(z(t)=1|z(t-1)=0)= 1-z 00 ; P(z(t)=0|z(t-1)=0)= z 00; P(z(t)=1|z(t-1)=1)=z 11 ; P(z(t)=0|z(t-1)=1)=1-z 11 i.e., if there is no event at time t-1, the chance of an event at time t is 1-z 00

20 The need for a Bayesian treatment for events This gives a latent ‘geometric’ distribution. Assume z 00 =.8; z 11 =.1. Then non-events tend to last an average of (1/.2)=5 half-hours while events tend to last an average of (1/.9)=1.11 half-hours. Classical statistics would dictate direct estimation of the z’s – but note that this says nothing about the tendencies of events to exhibit non-average behavior. It doesn’t provide information about prediction and estimation.

21 Priors for event probabilities Beta distributions: priors for the z’s. z 00 z 00 a[0]-1 (1-z 00 ) b[0]-1 and z 11 analogously. This characterizes the behavior of the underlying latent process. The hyperparameters a,b are designed to model that behavior. Recall that N 0 (t) (the non-event process) characterizes periodic and noise changes. The event process N E (t) characterizes other changes. N E (t) is 0 if z(t)=0 and Poisson with rate γ(t) if z(t)=1. So, if there is no event, N=N 0 (t). If there is an event, the frequency due to periodic or noise changes is N=N 0 (t)+N E (t) The rate γ(t) is itself gamma with parameters a E and b E. Hence it is marginally negative binomial with p=(b E /(1+b E ) and n=N.

22 Gibbs Sampling Gibbs sampling works by simulating each parameter/latent variable conditional on all the rest. The λ’s are parameters and the z’s,N’s are the latent variables. The resulting simulated values have an empirical distribution similar to the true posterior distribution. It works as a result of the fact that the joint distribution of parameters is determined by the set of all such conditional distributions.

23 Gibbs Sampling Given z(t)=0 and the remaining parameters, Put N 0 (t)=N(t) and N E (t)=0. If z(t)=1, simulate N E (t) as negative binomial with parameters, N(t) and b E /(1+b E ). Put N 0 (t)=N(t)-N E (t). To simulate z(t), define

24 More of Gibbs Sampling Then, if the previous state was 0, we get:

25 Gibbs Sampling (Continued) Having simulated z(t), we can simulate the parameters as follows: Where ‘N day ’ denotes the number of ‘day’ units in the data, ‘N hh ’ denotes the number of ‘hh’ periods in the data.

26 Gibbs Sampling (conclusion) We can simulate from the remaining conditional posterior distributions using standard MCMC techniques.

27 END – Thank You

28 Polya Tree Priors A more general methodology for introducing multiple prior levels is through Polya Tree Priors. (see Michael Lavine). For these priors, we divide the time interval (e.g., a week) into parts with relative frequencies: p 1,…,p k. ‘p’ has a dirichlet distribution. Given p, we further divide up each of the time interval parts into parts with corresponding conditional dirichlet distributions. We can continue to subdivide until it is no longer useful.


Download ppt "Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel."

Similar presentations


Ads by Google