Learning an Attribute Dictionary for Human Action Classification

Slides:



Advertisements
Similar presentations
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Aggregating local image descriptors into compact codes
Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS
Unsupervised Learning
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
1 Challenge the future HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences Omar Oreifej Zicheng Liu CVPR 2013.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
Patch-based Image Deconvolution via Joint Modeling of Sparse Priors Chao Jia and Brian L. Evans The University of Texas at Austin 12 Sep
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Chapter 4: Linear Models for Classification
Learning sparse representations to restore, classify, and sense images and videos Guillermo Sapiro University of Minnesota Supported by NSF, NGA, NIH,
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
1 Micha Feigin, Danny Feldman, Nir Sochen
Ilias Theodorakopoulos PhD Candidate
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Design of Non-Linear Kernel Dictionaries for Object Recognition
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Sparse and Overcomplete Data Representation
Image Denoising via Learned Dictionaries and Sparse Representations
Image Denoising with K-SVD Priyam Chatterjee EE 264 – Image Processing & Reconstruction Instructor : Prof. Peyman Milanfar Spring 2007.
Human Action Recognition
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
Visual Recognition Tutorial
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Crash Course on Machine Learning
This week: overview on pattern recognition (related to machine learning)
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
BrainStorming 樊艳波 Outline Several papers on icml15 & cvpr15 PALM Information Theory Learning.
 Karthik Gurumoorthy  Ajit Rajwade  Arunava Banerjee  Anand Rangarajan Department of CISE University of Florida 1.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Recognition Using Visual Phrases
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Sparsity Based Poisson Denoising and Inpainting

DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Learning Mid-Level Features For Recognition
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Classification Discriminant Analysis
PCA vs ICA vs LDA.
Classification Discriminant Analysis
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
Introduction PCA (Principal Component Analysis) Characteristics:
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Part-based visual tracking with online latent structural learning -Rui Yao et al. ICCV 2013 Cvlab Jung ilchae.
Bielefeld University, Germany
Presentation transcript:

Learning an Attribute Dictionary for Human Action Classification Qiang Qiu Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation and Recognition of Action Attributes”, ICCV 2011

Action Feature Representation Shape Motion HOG

Action Sparse Representation Sparse code = 0.64 0.53 -0.40 0.35 0.43 0.63 -0.33 -0.36 Action Dictionary = 0.43 × + 0.63 × - 0.33 × - 0.36 × 0.64× + 0.53× - 0.40 × +0.35 ×

K-SVD K-SVD Input: signals Y, dictionary size, sparisty T Sparse codes = Y y2 d1 d2 d3 … 0.64 0.53 -0.40 0.35 0.43 0.63 -0.33 -0.36 x1 x2 y1 D X Input signals Dictionary K-SVD Input: signals Y, dictionary size, sparisty T Output: dictionary D, sparse codes X arg min |Y- DX|2 s.t. i , |xi|0 ≤ T D,X [1] M. Aharon and M. Elad and A. Bruckstein, K-SVD: An Algorithm for Designing Overcomplete Dictionries for Sparse Representation, IEEE Trans. on Signal Process, 2006

Objective Compact Discriminative and Dictionary. Learn a

Probabilistic Model for Sparse Representation A Gaussian Process Dictionary Class Distribution

More Views of Sparse Representation = y2 y1 y4 y3 x1 x2 x3 x4 xd1 0.43 0.63 -0.33 -0.36 0.64 0.53 -0.40 0.35 -0.28 0.698 0.37 0.25 -0.42 0.42 0.47 0.32 l1 l2 d1 d2 d3 … l1 l2

A Gaussian Process y2 y1 y4 y3 l1 l2 xd1 x1 x2 x3 x4 … xd2 xd3 xd4 xd5 0.43 0 0 0 x1 x2 x3 x4 0.63 0 0 0 0 0.64 0 0 0 0.53 0 0 -0.33 -0.40 0 -0.42 … xd2 xd3 xd4 xd5 l1 l1 l2 l2 d1 d2 d3 d4 d5 A Gaussian Process Covariance function entry: K(i,j) = cov(xdi, xdj) P(Xd*|XD*) is a Gaussian with a closed-form conditional variance

Dictionary Class Distribution xd1 0.43 0 0 0 x1 x2 x3 x4 0.63 0 0 0 0 0.64 0 0 0 0.53 0 0 -0.33 -0.40 0 -0.42 … xd2 xd3 xd4 xd5 l1 l1 l2 l2 d1 d2 d3 d4 d5 Dictionary Class Distribution P(L|di), L [1, M] aggregate |xdi| based on class labels to obtain a M sized vector P(L=l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348 P(L=l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37

Dictionary Learning Approaches Maximization of Joint Entropy (ME) Maximization of Mutual Information (MMI) Unsupervised Learning (MMI-1) Supervised Learning (MMI-2)

Maximization of Joint Entropy (ME) - Initialize dictionary using k-SVD Do = - Start with D* = Untill |D*|=k, iteratively choose d* from Do\D*, d* = arg max H(d|D*) d ME dictionary where D A good approximation to ME criteria arg max H(D)

Maximization of Mutual Information for Unsupervised Learning (MMI-1) - Initialize dictionary using k-SVD Do = - Start with D* = Untill |D*|=k, iteratively choose d* from Do\D*, MMI dictionary d* = arg max H(d|D*) - H(d|Do\(D* d)) d Diversity Coverage Closed form: A near-optimal approximation to MMI arg max I(D; Do\D) within (1-1/e) of the optimum D

Revisit y2 y1 y4 y3 l1 l2 xd1 x1 x2 x3 x4 … xd2 xd3 xd4 xd5 0.43 0 0 0 x1 x2 x3 x4 0.63 0 0 0 0 0.64 0 0 0 0.53 0 0 -0.33 -0.40 0 -0.42 … xd2 xd3 xd4 xd5 l1 l1 l2 l2 d1 d2 d3 d4 d5 Revisit Dictionary Class Distribution P(L|di), L [1, M] aggregate|xdi|based on class labels to obtain a M sized vector P(l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348 P(l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37 P(Ld) = P(L|d) P(LD) = P(L|D) , where

Maximization of Mutual Information for Supervised Learning (MMI-2) - Initialize dictionary using k-SVD Do = - Start with D* = Untill |D*|=k, iteratively choose d* from Do\D*, d d* = arg max [H(d|D*) - H(d|Do\(D* d))] + λ[H(Ld|LD*) – H(Ld|LDo\(D* d))] - MMI-1 is a special case of MMI-2 with λ=0.

Keck gesture dataset

Representation Consistency [1] [1] J. Liu and M. Shah, Learning Human Actions via Information Maximization, CVPR 2008

Recognition Accuracy The recognition accuracy using initial dictionary Do: (a) 0.23 (b) 0.42 (c) 0.71

Recognizing Realistic Actions 150 broadcast sports videos. 10 different actions. Average recognition rate: 83:6% Best reported result 86.6%