Delivered By: Yuelei Xie

Delivered By: Yuelei Xie
Paper Reading Delivered By: Yuelei Xie @Vision Modeling Group @Key Laboratory of Intelligent Information Processing @Institute of Computing Technology @Chinese Academy of Sciences

Visual Event Recognition in Videos by Learning from Web Data
Best Student Paper Award of CVPR’2010 Authors: Lixin Duan, Dong Xu, Ivor W. Tsang @Nanyang Technological University Jiebo Luo @Kodak Research Labs

Authors Lixin Duan Best student paper of CVPR’2010
Awarded the prestigious Microsoft Research Asia Fellowship

Outline Motivation Contributions Aligned Space-time Pyramid Matching
Adaptive Multiple Kernel Learning Experiments

Motivation The learned classifiers from a limited number of training samples are usually not robust and do not generalize well. Web videos (Youtube) can be readily obtained by using keyword based search. Feature distributions of samples from two domains (web domain & consumer domain) may change considerably.

Main Target A visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., Youtube). Must deal with the distribution mismatch of videos from two domains. Learn a robust classifier for event recognition when requiring only a small number of labeled consumer videos.

Contributions (1/2) Extend the recent work on pyramid matching and propose a new aligned space-time pyramid matching method. Effectively measure the distances between two video clips from different domains

Contributions (2/2) A new cross-domain learning method, Adaptive Multiple Kernel Learning (A-MKL). Cope with variations in feature distributions between videos from the two domains. A new objective function to learn an adapted classifier based on multiple base kernels and the prelearned classifier by minimizing both the structural risk functional and mismatch of data distributions from two domains

Aligned Space-time Pyramid Matching (1/7)
Improved performances were reported by fusing the information from multiple pyramid levels. Unaligned space-time matching (block-to-block, volume-to-volume). ([Lazebnik’06, Laptev’08]). Spatially aligned [Xu’cvpr08] and temporally aligned [Xu’pami08] pyramid matching

Each clip is divided into 8l non-overlapping space-time volumes, and the size of each volume is set as 1/2l of the original video in width, height, and temporal dimension.

Matching Stage 1: Pair-wise distance Drc between each two volumes Vi(r) and Vj(c) Feature Type 1: STIPs. X2 distance is used to measure Drc (as Laptev’08).

Matching Stage 1: Pair-wise distance Drc between each two volumes Vi(r) and Vj(c) Feature Type 2: SIFT. Earth Mover’s Distance: :Euclidean distance between the token-frequency features of image block u in volume Vi(r) and image block v in volume Vj(c).

Matching Stage 1: Pair-wise distance Drc between each two volumes Vi(r) and Vj(c) Feature Type 2: SIFT. Earth Mover’s Distance: is the optimal flow solved by: ?

Matching Stage 2: Explicitly align the volumes by integrating the information from different volumes with Integer-flow EMD. is the flow matrix containing only binary elements representing unique matches between volumes Vi(r) and Vj(c)

Matching Stage 2: align the volumes. Always have an integer optimum solution when solved with the Simplex method. Distance between two clips at level-l:

Adaptive Multiple Kernel Learning (1/12)
Terminology: DA :auxiliary domain (web videos, source domain) DT :target domain (consumer videos) DT = DlT ∪ DUT , DlT labeled and DUT unlabeled data in DT. Element-wise product between vectors a and b is defined as:

Related cross-domain learning methods: Adaptive SVM (A-SVM): the target classifier fT(x) is adapted from one (fused) auxiliary classifier fA(x). Specifically, is the so-called perturbation function. The target classifier is learned based on only one kernel.

Related cross-domain learning methods: Domain Transfer SVM (DTSVM): simultaneously reduce the mismatch in the distributions between two domains and learn a target decision function. The mismatch is measured by Maximum Mean Discrepancy (MMD): based on the means of samples from DA and DT in Reproducing Kernel Hilbert Space (RKHS)

Related cross-domain learning methods: MMD in DTSVM: Define Then MMD is :

Motivated by A-SVM and DTSVM, the goal of our new cross-domain learning method A-MKL is to learn a target classifier which is adapted from a set of prelearned classifier as well as a perturbation function which is based on multiple base kernels km’s

Prelearned classifiers Train a set of independent classifiers for each pyramid level and each type of local features using the training data from two domains Equally fuse these classifiers to obtain average classifiers and , l = 0,1,…,L-1 Kernel function:

Inspired by semiparametric SVM, the target decision function is defined as: : prelearned classifiers : perturbation function

The first objective: reduce the mismatch in data distributions between two domains:

The second objective: minimize the structural risk functional. The final optimization problem in A-MKL: where

Convert the structural risk functional to a quadratic programming problem:

The dual of the QP problem: Just the same form as the dual of SVM with the kernel matrix , thus, we can solve the problem by existing SVM solvers like LIBSVM Lagrangian multipliers:

(9)

Experiments (1/7) Six events: “wedding”, ”birthday”, ”picnic”, “parade”, “show”, “sports”. Consumer videos (Kodak data set and personal videos) (195 clips) Youtube videos (906 clips) Training videos : all Youtube videos plus 3 randomly sampled clips for each event from consumer videos. Test videos : the rest clips in consumer videos after sampling. Five times sampling

Experiments (2/7) Aligned VS Unaligned Pyramid Matching
(1) Aligned outperform unaligned (2) SIFT features outperforms ST features

Experiments (4/7) Experiments Setting Comparison in three cases
20 base kernels Four kernel types ( i.e., Gaussian, Laplacian, ISD, ID) Five kernel parameters: 80 kernels in total Two pyramid levels, two types of local features Comparison in three cases Classifiers learned based on SIFT features Classifiers learned based on ST features Classifiers learned based on both features

Experiments (5/7) Learning A-MKL Four average classifiers in total
and Two average classifiers for case (a) and (b), four for case (c) 40 kernels for case (a) and (b), 80 for case (c) MMD criterion to measure the mismatch in data distributions All samples from target and auxiliary domain are used to cal h

Experiments (3/7) Comparison of cross-domain learning methods

谢谢月雷大 2011年 5月31日家 May 31, 2011 人体动作的时空表示与识别研究

Delivered By: Yuelei Xie

Similar presentations

Presentation on theme: "Delivered By: Yuelei Xie"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Delivered By: Yuelei Xie

Similar presentations

Presentation on theme: "Delivered By: Yuelei Xie"— Presentation transcript:

Similar presentations

About project

Feedback