Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visual Event Recognition in Videos by Learning from Web Data

Similar presentations


Presentation on theme: "Visual Event Recognition in Videos by Learning from Web Data"— Presentation transcript:

1 Visual Event Recognition in Videos by Learning from Web Data
Lixin Duan†, Dong Xu†, Ivor Tsang†, Jiebo Luo¶ † Nanyang Technological University, Singapore ¶ Kodak Research Labs, Rochester, NY, USA

2 Outline Overview of the Event Recognition System
Similarity between Videos Aligned Space-Time Pyramid Matching Cross-Domain Problem Adaptive Multiple Kernel Learning Experiments Conclusion

3 Overview GOAL: Recognize consumer videos
Large intra-class variability; limited labeled videos Wedding Sports Picnic

4 A Large Number of Web Videos
Overview GOAL: Recognize consumer videos by leveraging a large number of loosely labeled web videos (e.g., from YouTube) Wedding Sports Picnic Consumer Videos A Large Number of Web Videos

5 Overview Flowchart of the system Video Database Test video Classifier
Output

6 Similarity between Videos
Pyramid matching methods Temporally aligned pyramid matching, D. Xu and S.-F. Chang [1] Unaligned space-time pyramid matching, I. Laptev [2] Space-time axes Time axis Space axes

7 Similarity between Videos

8 Similarity between Videos
Aligned Space-Time Pyramid Matching Level 1 Distance

9 Similarity between Videos
Distance Integer-flow Earth Mover’s Distance (EMD), Y. Rubner [3] s.t.

10 Similarity between Videos
Distance Integer-flow Earth Mover’s Distance (EMD), Y. Rubner [3] s.t.

11 Cross-Domain Problem Data distribution mismatch between consumer videos and web videos Consumer videos: Naturally captured Web videos: Edited; Selected Maximum Mean Discrepancy (MMD), K. M. Borgwardt [4]

12 Cross-Domain Problem Prior information

13 Cross-Domain Problem

14 Cross-Domain Problem Adaptive Multiple Kernel Learning (A-MKL) MMD
Structural risk functional where

15 Cross-Domain Problem

16 Cross-Domain Problem

17 Experiments Data set 195 consumer videos and 906 web videos collected by ourselves and from Kodak Consumer Video Benchmark Data Set [5] 6 events: “wedding”, “birthday”, “picnic”, “parade”, “show” and “sports” Training data: 3 videos per event from consumer videos and all web videos Test data: The rest consumer videos

18 Experiments

19 Experiments Aligned Unaligned Aligned Space-Time Pyramid Matching (ASTPM) vs. Unaligned Space-Time Pyramid Matching (USTPM) ASTPM is better than USTPM at Level 1

20 Experiments

21 Experiments Comparisons of cross-domain learning methods
(a) SIFT features (b) ST features (c) SIFT features and ST features “parade”: 75.7% (A-MKL) vs. 62.2% (FR)

22 Experiments Comparisons of cross-domain learning methods
Relative improvements SVM_T: 36.9% SVM_AT: 8.6% Feature Replication (FR) [6]: 7.6% Adaptive SVM (A-SVM) [7]: 49.6% Domain Transfer SVM (DTSVM) [8]: 9.9% MKL-based methods Better fuse SIFT features and ST features Handle noise in the loose labels

23 Conclusion We propose a new event recognition framework for consumer videos by leveraging a large number of loosely labeled web videos. We develop a new aligned space-time pyramid matching method. We present a new cross-domain learning method A-MKL which handles the mismatch between the data distributions of the consumer video domain and the web video domain.

24 References [1] D. Xu and S.-F. Chang. Video event recognition using kernel methods with multi-level temporal alignment. T-PAMI, 30(11):1985–1997, [2] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, [3] Y. Rubner, C. Tomasi, and L. J. Guibas. The Earth mover’s distance as a metric for image retrieval. IJCV, 40(2): , [4] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf, and A. Smola. Integrating structured biological data by kernel maximum mean discrepancy. In ISMB, 2006.

25 References [5] F. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality and the SMO algorithm. In ICML, [6] H. Daumé III. Frustratingly easy domain adaptation. In ACL, [7] L. Duan, I. W. Tsang, D. Xu, and S. J. Maybank. Domain transfer svm for video concept detection. In CVPR, [8] J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM MM, [9] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.

26 Thank you!


Download ppt "Visual Event Recognition in Videos by Learning from Web Data"

Similar presentations


Ads by Google