Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.

Similar presentations


Presentation on theme: "Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014."— Presentation transcript:

1 Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014

2 Motivation

3 TrainingTestingAccuracy (avg) Dense STIP for cross dataset recognition UCF50 70 % UCF50HMDB51 55.7 % Olympic Sports 71.8 % UCF50 Olympic Sports 16.67 % Training and Testing is done on similar actions across the datasets Recognition Accuracy drops across the datasets!

4 UCF50 HMDB51 Is the background responsible for this drop in accuracy?

5 Do action classifiers learn background?

6 EXPERIMENT Two recent challenging datasets: UCF YouTube, 1100 Videos, 11 Actions UCF Sports, 150 Videos, 10 Actions Extract STIP (HOG, HOF) 50% Spatial-Temporal overlaps. Single scale Actor Bounding Boxes are available for these datasets Foreground Features: 50% overlap with bounding box Background Features: Less than 50% overlap with bounding box Dense Features: All features

7 EXPERIMENT Experimental Setup: Leave one group out for UCF YouTube Leave one actor out for UCF Sports

8 UCF YouTube Biking Video STIP Features UCF YouTube UCF Sports Foreground71.92 % 59.80 % Foreground Features UCF Sports Running Video Features with more than 50% overlap with actor bounding box

9 Dense Features UCF YouTube Biking Video UCF Sports Running Video STIP Features UCF YouTube UCF Sports Dense60.6%75.34 % Foreground 71.92 %59.80 % All features

10 STIP Features UCF YouTube UCF Sports 59.80 %71.92 % Dense60.6 %75.34 % Background55.27 %73.97 % Foreground Background Features UCF YouTube Biking Video UCF Sports Running Video Comparable performance with only background features without even observing the actor Features with less than 50% overlap with actor bounding box

11 Background should be diverse but not discriminative !

12 As action datasets are becoming large and more complex, their background may become more discriminative!

13 Action class discriminatively using GIST Experiment 1 Compute GIST descriptor for KTH, UCF50, HMDB51 datasets Cluster GIST descriptor in ‘k’ clusters using K-means clustering Estimate point-wise mutual information between each cluster and action class

14 Action class discriminatively using GIST KTH UCF50HMDB51 PMI Distance Matrices Clusters100200300 KTH5.128.2510.4 HMDB517.1511.0714.06 UCF507.9711.7914.38 Small value means more interclass confusion Based on scene information alone, KTH is harder to classify than HMDB51 and UCF50 Experiment 1 (Continue)

15 Action class discriminatively using GIST Experiment 2 is the set of descriptors Compute GIST descriptor for every 50 th frame in KTH, UCF50, HMDB51 datasets Graph Connected Component Analysis is performed by threshold ‘E’

16 Action class discriminatively using GIST Experiment 2 (Continue)

17 Our Approach Foreground Focused Representation

18 Action localization Binary foreground/Background Segmentation Very challenging and difficult, akin to introducing a new problem to solve the first. Instead Estimate the confidence in each pixel being a part of the foreground, and use it obtain video representation Foreground Focused Representation

19 Motion Gradients Color Gradients

20 Visual Saliency Fully connected graph is built, where Edge between two pixel is given as By computing stationary distribution of Markov chain, new graph is build Equilibrium distribution of chain is used as per pixel saliency

21 Visual Saliency Due to camera motion, video saliency is noisy Graph based Image Saliency ( NIPS 2006)

22 Coherence of Foreground Confidence Initial aggregate of confidence map The score is max-normalized for each frame of a video The quality of labeling is given by:

23 Coherence of Foreground Confidence (continue) Inference The message the node p send to q is given by The belief vector of node q is given by

24 Obtain probability of each pixel being the foreground using Motion Gradients Color Gradients Saliency Spatial-Temporal Coherence using 3D-MRF Final weights Video

25 UCF50 Pull up HMDB51 ride-horse HMDB51 ride-bike Olympic Sports Pole vault HMDB51 ride-bike Olympic Sports Diving UCF50 Basketball UCF50 Golf swing Examples: Final Foreground weights

26 UCF YouTube Biking Video Foreground wordsBackground words Traditional Bag of words Histogram

27 To make codebook biased towards foreground features, use weights of features during clustering k-means Weighted k-means The confidence of each descriptors as being on foreground in given by: The goal of clustering is to minimize the following energy function:

28 Example

29 To reduce the contribution of background features, use weights for each features of being foreground during quantization Histogram Weighted Histogram

30 Weighted-kmeans Weighted Histogram Background words Foreground words Weighted bag of words UCF YouTube Biking Video

31 Problem No separate foreground and background words or vocabulary The large number of background features can sum up to be significant.

32 Weighted-kmeans Weighted Histogram Background words Foreground words Weighted bag of words UCF YouTube Biking Video

33 Weighted-kmeans Weighted Histogram Background words Foreground words Weighted bag of words UCF YouTube Biking Video

34 Weighted-kmeans Weighted Histogram Background words Foreground words Weighted bag of words UCF YouTube Biking Video

35 Foreground confidence based Histogram decomposition Compute Histograms for each region separately The regions of two videos that has same foreground confidence are compared only with each other The kernel function becomes

36 Features partitions based on weights Partition based weighted histograms 0 1

37 UCF50 Video HMDB51 Video Weights Partitions Weighted Histograms Final Similarity Histogram Intersection Weighted Summations

38 Experimental Results UCF50 Vs. HMDB51 10 common actions We choose actions which are visually similar: Biking, Golf Swing, Pull Ups, Horse Riding, Basketball UCF50 Vs. Olympic Sport 6 common actions: Basketball, Pole Vault, Tennis serve, Diving, Clean and Jerk, Throw Discus Datasets used: UCF50, HMDB51, Olympic Sports Features used: STIP

39 Qualitative Results Biking HMDB51UCF 50 Histogram Intersection= 0.1035 Weighted Histogram Intersection=0.1142 Weighted Histogram Decomposition =0.1295

40 Qualitative Results Golf Swing HMDB51UCF 50 Histogram Intersection= 0.1684 Weighted Histogram Intersection=0.2740 Weighted Histogram Decomposition =0.3089

41 Quantitative Results Pull Ups HMDB51UCF 50 Histogram Intersection= 0.2744 Weighted Histogram Intersection=0.5454 Weighted Histogram Decomposition =0.5586

42 Qualitative Results TrainingTestingUnweightedWeighted UCF50 70.0074.2077.85 UCF50HMDB5155.7060.0068.70 HMDB51 65.3069.3068.00 HMDB51UCF5063.364.0068.67 Olympic Sports 71.8073.9569.79 UCF50Olympic Sports 31.25 33.33 Olympic SportsUCF5016.6732.2947.91 Histogram Decomposition

43 Quantitative Results Confusion Matrix UCF50 classifiers on HMDB51 UnweightedHistogram Decomposition

44 Thank you


Download ppt "Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014."

Similar presentations


Ads by Google