A Hierarchical Deep Temporal Model for Group Activity Recognition

Name: A Hierarchical Deep Temporal Model for Group Activity Recognition
Uploaded: 2017-10-11T10:12:51+00:00
Duration: PTM11S23
Channel: Monica Underwood
Description: A Hierarchical Deep Temporal Model for Group Activity Recognition

A Hierarchical Deep Temporal Model for Group Activity Recognition
MSc Thesis Defence Srikanth Muralidharan 12 April 2016 Good Afternoon. Welcome to my Thesis talk. I am going to present our work on Group Activity Recognition using hierarchical deep temporal model.

Outline Part I : Introduction to Group Activity
Part II : Description of the Model Part III : Experimental Results and Conclusion

Part I : Introduction to Group Activity

Preview – Action Recognition
Walking

Action Recognition Datasets : A brief overview
2010 Olympic sports dataset 16 classes 2014 Youtube 1M dataset 480+ classes 2004 KTH dataset 6 classes

Summary-Action Recognition
Task : Predict what a single person is doing Difficulty – intraclass variations Difficulty - unconstrained nature of videos

Example : A surveillance scene
We consider two types of scenarios. First is a surveillance scene. Here, in this example, most of the people are seen walking on a sidewalk, and therefore this video could be labelled as a walking scene.

It’s a walking scene. Walking Walking Walking Walking Walking Standing
We consider two types of scenarios. First is a surveillance scene. Here, in this example, most of the people are seen walking on a sidewalk, and therefore this video could be labelled as a walking scene.

Example: Rally in a Volleyball Scene
The second example is a rally in volleyball scene. Here, the high level activity is determined by the main activity taking place, i.e. a player in the left side involved spiking. Therefore, we could label this scene as left_spike.

Left Spike Spiking Waiting Waiting Standing waiting Waiting Moving
The second example is a rally in volleyball scene. Here, the high level activity is determined by the main activity taking place, i.e. a player in the left side involved spiking. Therefore, we could label this scene as left_spike.

Challenge 1 – Context Dependency
Group Activity = Majority’s Activity Group Activity = Key Player’s Activity Challenge 1 – Context Dependency Group Activity – Right spike Challenge 2 - high level description

Group Activity Recognition vs Action Recognition
Walking

It’s hard! Group activity label Image Classifier
Be careful with the description!

Intuitive fix: Use only the foreground features
Therefore, the intuitive fix is to use just the features obtained from foreground

Group Activity – ???? waiting Person classifier Digging waiting
spiking waiting Person classifier We cut out all the people, extract their feature representation

Possible Solution - Hierarchical model
Pool person features Digging waiting waiting spiking waiting Stage 1 - Person feature extractor We cut out all the people, extract their feature representation

Possible Solution - Hierarchical model
Output Group Activity Stage 2: Frame Classifier Pooled person features We cut out all the people, extract their feature representation

Pipeline Overview Learn People Representations
Aggregate People Representations Learn Group Representations

From images to video clips
Given the person level annotations, we track each person assigning same label across the tracks

LSTM – An Introduction Stands for Long Short Term Memory
Sequential Neural Network that learns from arbitrary length inputs

LSTM – An Introduction Output Output Output LSTM LSTM LSTM x(t=T)

We use LSTMs for building person classification model and extracting person features
We construct an LSTM based frame classifier on top of pooled LSTM features

Stage1 : Learning Individual Activity Features
Softmax Softmax Softmax LSTM LSTM LSTM Alexnet Alexnet Alexnet

Stage1 : Learning Individual Activity Features
Person 1 LSTM Person 1 feature Representation LSTM Person 2 feature Representation Person 2 LSTM Person 3 feature Representation Person 3 . . . LSTM Person n feature Representation Person n

Stage 2: Learning Frame Representations

Tracker details We obtain 10-frame video clips – 5 before, 4 after an annotated frame We use LSTMs with 10 video clips as batch size No annotations for the tracked frames - use of unlabelled data

Collective Activity Dataset
Same label set for people and group activities 1925 video clips for training, 638 video clips for testing 1. Crossing 2. Queueing 3. Talking 4. Waiting 5. Walking

Experimental results on Collective Activity Dataset
Method Accuracy Image Classification 63.0 Person Classification 61.8 Person - Fine tuned 66.3 Temp Model - Person 62.2 Temp Model - Image 64.2 Our Model 81.5

Experimental results on Collective Activity Dataset
Method Accuracy Contextual Model [Lan NIPS’10] 79.1 Deep Structured Model [Deng BMVC‘15] 80.6 Our Model 81.5 Cardinality Kernel [Hajimirsadeghi CVPR‘15] 83.4 Method Accuracy Image Classification 63.0 Person Classification 61.8 Person - Fine tuned 66.3 Temp Model - Person 62.2 Temp Model - Image 64.2 Our Model 81.5

Volleyball Dataset – Frame Labels
1047 images for training, 478 images for testing 1. Spiking 2. Setting 3. Passing

Volleyball Dataset – People Labels
1047 images for training, 478 images for testing 1. Waiting 2. Digging 3. Setting 4. Spiking 5. Falling 6. Blocking

Experimental results on Volleyball Dataset
Method Accuracy Image Classification 46.7 Person Classification 33.1 Person - Fine tuned 35.2 Temp Model - Person 45.9 Temp Model - Image 37.4 Our Model 51.1

Visualization of results
Left set Right pass Right Spike Left pass Left spike (Left pass) Right spike (Left spike)

Conclusion A two stage hierarchical model for group activity recognition LSTMs as a highly effective temporal model and temporal feature source Decent people-relation modeling with simple pooling

Future Work Semi-supervised approaches to diversify the new datasets
Experiments under weakly supervised setting Semi-supervised approaches to diversify the new datasets

THANK YOU

A Hierarchical Deep Temporal Model for Group Activity Recognition

Similar presentations

Presentation on theme: "A Hierarchical Deep Temporal Model for Group Activity Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Hierarchical Deep Temporal Model for Group Activity Recognition

Similar presentations

Presentation on theme: "A Hierarchical Deep Temporal Model for Group Activity Recognition"— Presentation transcript:

Similar presentations

About project

Feedback