Presentation is loading. Please wait.

Presentation is loading. Please wait.

Delivered By: Yuelei Xie

Similar presentations


Presentation on theme: "Delivered By: Yuelei Xie"— Presentation transcript:

1 Delivered By: Yuelei Xie
Paper Reading Delivered By: Yuelei Xie @Vision Modeling Group @Key Laboratory of Intelligent Information Processing @Institute of Computing Technology @Chinese Academy of Sciences

2 Visual Event Recognition in Videos by Learning from Web Data
Best Student Paper Award of CVPR’2010 Authors: Lixin Duan, Dong Xu, Ivor W. Tsang @Nanyang Technological University Jiebo Luo @Kodak Research Labs

3 Authors Lixin Duan Best student paper of CVPR’2010
Awarded the prestigious Microsoft Research Asia Fellowship

4 Outline Motivation Contributions Aligned Space-time Pyramid Matching
Adaptive Multiple Kernel Learning Experiments

5 Motivation The learned classifiers from a limited number of training samples are usually not robust and do not generalize well. Web videos (Youtube) can be readily obtained by using keyword based search. Feature distributions of samples from two domains (web domain & consumer domain) may change considerably.

6 Main Target A visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., Youtube). Must deal with the distribution mismatch of videos from two domains. Learn a robust classifier for event recognition when requiring only a small number of labeled consumer videos.

7 Contributions (1/2) Extend the recent work on pyramid matching and propose a new aligned space-time pyramid matching method. Effectively measure the distances between two video clips from different domains

8 Contributions (2/2) A new cross-domain learning method, Adaptive Multiple Kernel Learning (A-MKL). Cope with variations in feature distributions between videos from the two domains. A new objective function to learn an adapted classifier based on multiple base kernels and the prelearned classifier by minimizing both the structural risk functional and mismatch of data distributions from two domains

9 Aligned Space-time Pyramid Matching (1/7)
Improved performances were reported by fusing the information from multiple pyramid levels. Unaligned space-time matching (block-to-block, volume-to-volume). ([Lazebnik’06, Laptev’08]). Spatially aligned [Xu’cvpr08] and temporally aligned [Xu’pami08] pyramid matching

10 Aligned Space-time Pyramid Matching (2/7)
Each clip is divided into 8l non-overlapping space-time volumes, and the size of each volume is set as 1/2l of the original video in width, height, and temporal dimension.

11 Aligned Space-time Pyramid Matching (3/7)
Matching Stage 1: Pair-wise distance Drc between each two volumes Vi(r) and Vj(c) Feature Type 1: STIPs. X2 distance is used to measure Drc (as Laptev’08).

12 Aligned Space-time Pyramid Matching (4/7)
Matching Stage 1: Pair-wise distance Drc between each two volumes Vi(r) and Vj(c) Feature Type 2: SIFT. Earth Mover’s Distance: :Euclidean distance between the token-frequency features of image block u in volume Vi(r) and image block v in volume Vj(c).

13 Aligned Space-time Pyramid Matching (5/7)
Matching Stage 1: Pair-wise distance Drc between each two volumes Vi(r) and Vj(c) Feature Type 2: SIFT. Earth Mover’s Distance: is the optimal flow solved by: ?

14 Aligned Space-time Pyramid Matching (6/7)
Matching Stage 2: Explicitly align the volumes by integrating the information from different volumes with Integer-flow EMD. is the flow matrix containing only binary elements representing unique matches between volumes Vi(r) and Vj(c)

15 Aligned Space-time Pyramid Matching (7/7)
Matching Stage 2: align the volumes. Always have an integer optimum solution when solved with the Simplex method. Distance between two clips at level-l:

16 Adaptive Multiple Kernel Learning (1/12)
Terminology: DA :auxiliary domain (web videos, source domain) DT :target domain (consumer videos) DT = DlT ∪ DUT , DlT labeled and DUT unlabeled data in DT. Element-wise product between vectors a and b is defined as:

17 Adaptive Multiple Kernel Learning (2/12)
Related cross-domain learning methods: Adaptive SVM (A-SVM): the target classifier fT(x) is adapted from one (fused) auxiliary classifier fA(x). Specifically, is the so-called perturbation function. The target classifier is learned based on only one kernel.

18 Adaptive Multiple Kernel Learning (3/12)
Related cross-domain learning methods: Domain Transfer SVM (DTSVM): simultaneously reduce the mismatch in the distributions between two domains and learn a target decision function. The mismatch is measured by Maximum Mean Discrepancy (MMD): based on the means of samples from DA and DT in Reproducing Kernel Hilbert Space (RKHS)

19 Adaptive Multiple Kernel Learning (4/12)
Related cross-domain learning methods: MMD in DTSVM: Define Then MMD is :

20 Adaptive Multiple Kernel Learning (5/12)
Motivated by A-SVM and DTSVM, the goal of our new cross-domain learning method A-MKL is to learn a target classifier which is adapted from a set of prelearned classifier as well as a perturbation function which is based on multiple base kernels km’s

21 Adaptive Multiple Kernel Learning (6/12)
Prelearned classifiers Train a set of independent classifiers for each pyramid level and each type of local features using the training data from two domains Equally fuse these classifiers to obtain average classifiers and , l = 0,1,…,L-1 Kernel function:

22 Adaptive Multiple Kernel Learning (7/12)
Inspired by semiparametric SVM, the target decision function is defined as: : prelearned classifiers : perturbation function

23 Adaptive Multiple Kernel Learning (8/12)
The first objective: reduce the mismatch in data distributions between two domains:

24 Adaptive Multiple Kernel Learning (9/12)
The second objective: minimize the structural risk functional. The final optimization problem in A-MKL: where

25 Adaptive Multiple Kernel Learning (10/12)
Convert the structural risk functional to a quadratic programming problem:

26 Adaptive Multiple Kernel Learning (11/12)
The dual of the QP problem: Just the same form as the dual of SVM with the kernel matrix , thus, we can solve the problem by existing SVM solvers like LIBSVM Lagrangian multipliers:

27 Adaptive Multiple Kernel Learning (12/12)
(9)

28 Experiments (1/7) Six events: “wedding”, ”birthday”, ”picnic”, “parade”, “show”, “sports”. Consumer videos (Kodak data set and personal videos) (195 clips) Youtube videos (906 clips) Training videos : all Youtube videos plus 3 randomly sampled clips for each event from consumer videos. Test videos : the rest clips in consumer videos after sampling. Five times sampling

29 Experiments (2/7) Aligned VS Unaligned Pyramid Matching
(1) Aligned outperform unaligned (2) SIFT features outperforms ST features

30 Experiments (4/7) Experiments Setting Comparison in three cases
20 base kernels Four kernel types ( i.e., Gaussian, Laplacian, ISD, ID) Five kernel parameters: 80 kernels in total Two pyramid levels, two types of local features Comparison in three cases Classifiers learned based on SIFT features Classifiers learned based on ST features Classifiers learned based on both features

31 Experiments (5/7) Learning A-MKL Four average classifiers in total
and Two average classifiers for case (a) and (b), four for case (c) 40 kernels for case (a) and (b), 80 for case (c) MMD criterion to measure the mismatch in data distributions All samples from target and auxiliary domain are used to cal h

32 Experiments (3/7) Comparison of cross-domain learning methods

33 谢月雷 2011年 5月31日 May 31, 2011 人体动作的时空表示与识别研究


Download ppt "Delivered By: Yuelei Xie"

Similar presentations


Ads by Google