Presentation is loading. Please wait.

Presentation is loading. Please wait.

Activity Analysis in Video Spring 2005 Computational Intelligence Seminar Series Partial Review of the Paper “Discovery and Segmentation of Activities.

Similar presentations


Presentation on theme: "Activity Analysis in Video Spring 2005 Computational Intelligence Seminar Series Partial Review of the Paper “Discovery and Segmentation of Activities."— Presentation transcript:

1 Activity Analysis in Video Spring 2005 Computational Intelligence Seminar Series Partial Review of the Paper “Discovery and Segmentation of Activities in Video” By Matthew Brand (MIRL) Presented by Derek Anderson

2 Topics 1. TigerPlace Project 2. Monitoring Silhouette Activity 3. Monitoring Object Activity 4. Monitoring both (separate or combined) 5. Hidden Markov Models (Brief Introduction) 6. Evolutionary Computing for Structure Discovery 7. Matthew Brands Approach to Activity Recognition

3 Context for this Presentation TigerPlace Project TigerPlace Project One component of our system will involve analyzing video (in real-time) and recognizing an important set of “short term” activities One component of our system will involve analyzing video (in real-time) and recognizing an important set of “short term” activities

4 Sensor and Video Networks We are doing the research for the video sensor network We are doing the research for the video sensor network iPAQ hx4700 series PDA with HP PhotoSmart Digital Cameras iPAQ hx4700 series PDA with HP PhotoSmart Digital Cameras The results from the video network can be combined with other sources of information from the sensor network (gait monitor, bed sensors, …) to reduce false alarm rates and help increase the overall confidence that the activities occurred The results from the video network can be combined with other sources of information from the sensor network (gait monitor, bed sensors, …) to reduce false alarm rates and help increase the overall confidence that the activities occurred Is this going to be handled inside the behavior reasoning component of the system … (fuzzy rules)? Is this going to be handled inside the behavior reasoning component of the system … (fuzzy rules)? Fuzzy Integrals? Fuzzy Integrals? Fuzzy Integral: use each of the sources of information in the sensor and video networks, taking into account how reliable each individually are (possible for different kinds of tasks), and asses our confidence in a particular hypothesis, which is an individual activity? Fuzzy Integral: use each of the sources of information in the sensor and video networks, taking into account how reliable each individually are (possible for different kinds of tasks), and asses our confidence in a particular hypothesis, which is an individual activity?

5 Important Elderly Activities What kind of activities to recognize? What kind of activities to recognize? Presently, we are deciding on an initial set to study Presently, we are deciding on an initial set to study A few possibilities include A few possibilities include Total body motion Total body motion Falling down (and not being able to get up) Falling down (and not being able to get up) Someone entering and leaving their bed Someone entering and leaving their bed Sitting and getting up from a chair Sitting and getting up from a chair Partial body motion Partial body motion Taking their medicine Taking their medicine Drinking Drinking

6 Monitoring while Ensuring Privacy What features for the video system? What features for the video system? Common approach: Silhouette’s Common approach: Silhouette’s Silhouette is an image based representation of individual with nearly all personal and distinguishing information removed Silhouette is an image based representation of individual with nearly all personal and distinguishing information removed Features from silhouettes will be used to monitor an individuals activity Features from silhouettes will be used to monitor an individuals activity These silhouettes will be initially extracted through image subtraction against a known and stationary background (cleaned up with binary morphology, reconstruction operator) These silhouettes will be initially extracted through image subtraction against a known and stationary background (cleaned up with binary morphology, reconstruction operator)

7 What the Silhouette's really look like (still a very ideal setting) Conventional Morphological Opening of Extracted Silhouette (Left) Morphological Reconstruction Operation on Extracted Silhouette (Right)

8 Silhouette motion over time (identification of activity regions) Consecutive Silhouette Subtraction (left) and after additional Erosion Operation (right)

9 New Application? Do not necessarily focus on the silhouettes, but rather the objects in the environment (or the co-interaction of the two) Do not necessarily focus on the silhouettes, but rather the objects in the environment (or the co-interaction of the two) Object or interesting landmark identification Object or interesting landmark identification SIFT (Scale Invariant Feature Transform) SIFT (Scale Invariant Feature Transform) Interesting enough texture on everything? Interesting enough texture on everything? Where are the camera’s placed? Where are the camera’s placed? Too complex to apply at first? Too complex to apply at first? Will it run real time (present equation, Bob = NO) Will it run real time (present equation, Bob = NO) Low level simple image processing techniques Low level simple image processing techniques Have to see what the resolution and quality of the images are Have to see what the resolution and quality of the images are Use simpler image processing techniques to recognize particular objects Use simpler image processing techniques to recognize particular objects How to deal with some occlusion (why co-interaction might be helpful) How to deal with some occlusion (why co-interaction might be helpful) Used the to help identify skin regions that helped in dealing with occlusion for objects the individual would interact with (tracked the hands) Used the YUV color space to help identify skin regions that helped in dealing with occlusion for objects the individual would interact with (tracked the hands) NLM Short-Term Fellowship (Summer 2004) NLM Short-Term Fellowship (Summer 2004) At the end of the summer, I used Bob’s SIFT implementation to identify key points from a pill bottle (used the minimum spanning tree and density measure) At the end of the summer, I used Bob’s SIFT implementation to identify key points from a pill bottle (used the minimum spanning tree and density measure) Helped reduce some of the false alarms (in the pill taking activity) Helped reduce some of the false alarms (in the pill taking activity)

10 Activity Recognition I don’t think that we have decided on the exact approach to use yet? I don’t think that we have decided on the exact approach to use yet? Looks like some form of HMMs might be as good of place as any to start? Looks like some form of HMMs might be as good of place as any to start? Simple Simple DOHMMs, COHMMs, or MDCOHMMs DOHMMs, COHMMs, or MDCOHMMs HHMMs (Hierarchical) HHMMs (Hierarchical) Learning Hierarchical Hidden Markov Models for Video Structure Discovery Learning Hierarchical Hidden Markov Models for Video Structure Discovery Entropic HMMs (Structure discovery) Entropic HMMs (Structure discovery) Discovery and Segmentation of Activities in Video Discovery and Segmentation of Activities in Video

11 Temporal Pattern Recognition Hidden Markov Models (HMM) are statistical methods (stochastic networks) that model sequential patterns that arise from a set of observation sequences which are believed to have come from the process of interest. Hidden Markov Models (HMM) are statistical methods (stochastic networks) that model sequential patterns that arise from a set of observation sequences which are believed to have come from the process of interest. HMMs are known for their application in areas such as natural speech recognition, word and symbol recognition, etc... HMMs are known for their application in areas such as natural speech recognition, word and symbol recognition, etc... HMMs are a doubly embedded stochastic process with an underlying process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce the sequence of observations. HMMs are a doubly embedded stochastic process with an underlying process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce the sequence of observations. 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

12 Mixture Density Continuous Observation HMM

13 HMM Problems 1) Given the observation sequence O = O 1 O 2 O 3 …O t, and a model m = (A, B, p), how do we efficiently compute P(O | m)? 2) Given the observation sequence O and a model m, how do we choose a corresponding state sequence Q = q 1 q 2 q 3 …q t which is optimal in some meaningful sense? 3) How do we adjust the model parameters to maximize P(O | m)?

14 Structure Discovery A serious problem related to the deployment of HMMs involves how to specify or learn the HMM model structure A serious problem related to the deployment of HMMs involves how to specify or learn the HMM model structure Matthew Brand has proposed a method based on entropy to learn an “optimal” model structure Matthew Brand has proposed a method based on entropy to learn an “optimal” model structure We might look at identifying a general way to learn the model structure in a simpler fashion, independent of the HMM type, since this will be used in not just a “lab” setting We might look at identifying a general way to learn the model structure in a simpler fashion, independent of the HMM type, since this will be used in not just a “lab” setting I am presently looking into using Evolutionary Computing (EC) techniques to evolve and learn the HMM structure automatically I am presently looking into using Evolutionary Computing (EC) techniques to evolve and learn the HMM structure automatically The difference would be related to the “compression” aspect and the few number of observations samples Brand claims works The difference would be related to the “compression” aspect and the few number of observations samples Brand claims works

15 EP Overview Generation t+1 S1 S2 S3 S1 S4 S2 S3 S1 S2 S3 S1 S2 S3 S1 S4 S2 S3 S1 S2 Generation t F(P i ) Generation t S1 S4 S2 S3 S1 S2 S3 F(O i ) Mutation {P 1, P 2, P 3, O 1, O 2, O 3 } Selection HMM

16 Walk before we start running Initially Initially Test how well the procedure works on a fully connected DOHMM when we only mutate the states (add and remove operators) Test how well the procedure works on a fully connected DOHMM when we only mutate the states (add and remove operators) Test a few different measures of complexity (the different fitness functions) Test a few different measures of complexity (the different fitness functions) Each chromosome in a generation acts like a seed to the next iterations Baum-Welch algorithm Each chromosome in a generation acts like a seed to the next iterations Baum-Welch algorithm Later Later Consider a more complicated MDCOHMM model Consider a more complicated MDCOHMM model Try to derive a series of equations and mutation operators that can take an initial population estimated by the Baum- Welch and evolve what was found (I believe that this would be a completely new technique) Try to derive a series of equations and mutation operators that can take an initial population estimated by the Baum- Welch and evolve what was found (I believe that this would be a completely new technique)

17 Matthew Brands Approach The principle of maximum likelihood is not valid for small data sets, the training is rarely enough to wash out the sampling artifacts (i.e. noise) The principle of maximum likelihood is not valid for small data sets, the training is rarely enough to wash out the sampling artifacts (i.e. noise) He also leaves out the obvious, related to if we have enough observations to estimate all the different parameters in the network (the degrees of freedom) He also leaves out the obvious, related to if we have enough observations to estimate all the different parameters in the network (the degrees of freedom) We may only have a few number of observations with a few “reflective” sub-observation sequences We may only have a few number of observations with a few “reflective” sub-observation sequences He advocates replacing the Baum-Welch formulae with parameter estimators based that minimize entropy He advocates replacing the Baum-Welch formulae with parameter estimators based that minimize entropy Claim is that this exploits the duality between learning and compression Claim is that this exploits the duality between learning and compression

18 Entropy Minimization

19 First Setup Variety of activity, from picking up the phone (a few seconds) to activities such as writing (could take up to hours) Variety of activity, from picking up the phone (a few seconds) to activities such as writing (could take up to hours) Used a “blob” representation consisting of ellipse parameters fitting the single largest connected set of active pixels Used a “blob” representation consisting of ellipse parameters fitting the single largest connected set of active pixels Background subtraction through identifying a statistical model of the background and an adaptive Gaussian color/location model (pixels that have changed and others due to motion) Background subtraction through identifying a statistical model of the background and an adaptive Gaussian color/location model (pixels that have changed and others due to motion) Cleaned up the “blob” through dilation (he makes reference to using a seed from the previous frame) Cleaned up the “blob” through dilation (he makes reference to using a seed from the previous frame) Observation vector uses high level geometric features, calculated from the mean and eigenvectors of a 2D Gaussian fitted to the foreground pixels Observation vector uses high level geometric features, calculated from the mean and eigenvectors of a 2D Gaussian fitted to the foreground pixels 30 minutes of data taken at random 30 minutes of data taken at random removed frames when no one is in the video removed frames when no one is in the video roughly 21 minutes after this roughly 21 minutes after this

20 Training Only three sequences used for training Only three sequences used for training Varied from 100 to 1,900 frames in length Varied from 100 to 1,900 frames in length # states = {12, 16, 20, 25, and 30} # states = {12, 16, 20, 25, and 30}

21 Procedure 1: Model Activity

22

23 Procedure 2: Monitoring Traffic

24 Monitoring Simultaneous Processes HMMs traditionally are used to model a single hidden process HMMs traditionally are used to model a single hidden process Brand modified (don’t know if he is the first, he claims this is novel) HMMs to take a varying number of observations per time step Brand modified (don’t know if he is the first, he claims this is novel) HMMs to take a varying number of observations per time step The new image representation is a variable length list of flow vectors between two subsequent images The new image representation is a variable length list of flow vectors between two subsequent images Flow vectors that are smaller than some predefined threshold are disregarded Flow vectors that are smaller than some predefined threshold are disregarded The model learns the typical locations and directions of the moving pixels, and the dynamic changes of these patterns The model learns the typical locations and directions of the moving pixels, and the dynamic changes of these patterns

25 Internals Brand uses a modified version of a multivariate Gaussian mixture model Brand uses a modified version of a multivariate Gaussian mixture model He deals with multiple observations per time step by treating each frame’s flow-list as an observation sequence for a mixture model at one time step He deals with multiple observations per time step by treating each frame’s flow-list as an observation sequence for a mixture model at one time step

26 multi-observation-mixture+counter (MOMC) HMM First term is a distribution on the obv count First term is a distribution on the obv count The mixture Gaussians are 4D observing flow vectors in (x,y,dx,dy) space The mixture Gaussians are 4D observing flow vectors in (x,y,dx,dy) space The mixture components model motion in particular directions and locations The mixture components model motion in particular directions and locations The counter variable essentially models the combined surface area of the moving objects The counter variable essentially models the combined surface area of the moving objects

27

28 Any Questions?

29 HMM Links Hidden Markov Models (General Introductions) Hidden Markov Models (General Introductions) http://uirvli.ai.uiuc.edu/dugad/hmm_tut.html http://uirvli.ai.uiuc.edu/dugad/hmm_tut.html http://uirvli.ai.uiuc.edu/dugad/hmm_tut.html http://www.cse.ucsc.edu/research/compbio/html_format_papers/hugh krogh96/cabios.html http://www.cse.ucsc.edu/research/compbio/html_format_papers/hugh krogh96/cabios.html http://www.cse.ucsc.edu/research/compbio/html_format_papers/hugh krogh96/cabios.html http://www.cse.ucsc.edu/research/compbio/html_format_papers/hugh krogh96/cabios.html Baum-Welch algorithm and the EM (Simpler math derivation) Baum-Welch algorithm and the EM (Simpler math derivation) (Bilmes) http://citeseer.ist.psu.edu/bilmes98gentle.html (Bilmes) http://citeseer.ist.psu.edu/bilmes98gentle.htmlhttp://citeseer.ist.psu.edu/bilmes98gentle.html Entropic Hidden Markov Models (Matthew Brand) Entropic Hidden Markov Models (Matthew Brand) Discovery and Segmentation of Activities in Video (IEEE Transactions on pattern analysis and machine intelligence, Vol 22, No. 8, Aug 2000) Discovery and Segmentation of Activities in Video (IEEE Transactions on pattern analysis and machine intelligence, Vol 22, No. 8, Aug 2000) Fuzzy Hidden Markov Models (Gader and Mohammed) Fuzzy Hidden Markov Models (Gader and Mohammed) Generalized Hidden Markov Models – Part I: Theoretical Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No 1, Feb 2000) Generalized Hidden Markov Models – Part I: Theoretical Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No 1, Feb 2000)


Download ppt "Activity Analysis in Video Spring 2005 Computational Intelligence Seminar Series Partial Review of the Paper “Discovery and Segmentation of Activities."

Similar presentations


Ads by Google