Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.

Slides:



Advertisements
Similar presentations
By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.
Advertisements

Actions in video Monday, April 25 Kristen Grauman UT-Austin.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
Lecture 31: Modern object recognition
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Bangpeng Yao and Li Fei-Fei
Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
2D Human Pose Estimation in TV Shows Vittorio Ferrari Manuel Marin Andrew Zisserman Dagstuhl Seminar July 2008.
Detecting Pedestrians by Learning Shapelet Features
A Robust Pedestrian Detection Approach Based on Shapelet Feature and Haar Detector Ensembles Wentao Yao, Zhidong Deng TSINGHUA SCIENCE AND TECHNOLOGY ISSNl.
Local Descriptors for Spatio-Temporal Recognition
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Recognition: A machine learning approach
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.
Visual Object Recognition Rob Fergus Courant Institute, New York University
Lecture 29: Recent work in recognition CS4670: Computer Vision Noah Snavely.
A String Matching Approach for Visual Retrieval and Classification Mei-Chen Yeh* and Kwang-Ting Cheng Learning-Based Multimedia Lab Department of Electrical.
ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Bag of Video-Words Video Representation
Action Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/11.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Action recognition with improved trajectories
Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Object Detection Sliding Window Based Approach Context Helps
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Recognizing Human Figures and Actions Greg Mori Simon Fraser University.
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
Face detection Slides adapted Grauman & Liebe’s tutorial
Visual Object Recognition
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
Image Categorization Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/11.
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays.
12/7/10 Looking Back, Moving Forward Computational Photography Derek Hoiem, University of Illinois Photo Credit Lee Cullivan.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Project 3 Results.
Histograms of Oriented Gradients for Human Detection(HOG)
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Week 10 Emily Hand UNR.
Local features: detection and description
A REAL-TIME DEFORMABLE DETECTOR 謝汝欣 OUTLINE  Introduction  Related Work  Proposed Method  Experiments 2.
Representation in Vision Derek Hoiem CS 598, Spring 2009 Jan 22, 2009.
Data Driven Attributes for Action Detection
Tracking Objects with Dynamics
Action Recognition ECE6504 Xiao Lin.
Object detection as supervised classification
CS 1674: Intro to Computer Vision Scene Recognition
Brief Review of Recognition + Context
Human Activity Analysis
KFC: Keypoints, Features and Correspondences
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Presentation transcript:

Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman

What is an action? Action: a transition from one state to another Who is the actor? How is the state of the actor changing? What (if anything) is being acted on? How is that thing changing? What is the purpose of the action (if any)?

Human activity in video No universal terminology, but approximately: “Actions”: atomic motion patterns -- often gesture- like, single clear-cut trajectory, single nameable behavior (e.g., sit, wave arms) “Activity”: series or composition of actions (e.g., interactions between people) “Event”: combination of activities or actions (e.g., a football game, a traffic accident) Adapted from Venu Govindaraju

How do we represent actions? Categories Walking, hammering, dancing, skiing, sitting down, standing up, jumping Poses Nouns and Predicates

What is the purpose of action recognition?

Surveillance

2011 Interfaces

2011 W. T. Freeman and C. Weissman, Television control by hand gestures, International Workshop on Automatic Face- and Gesture- Recognition, IEEE Computer Society, Zurich, Switzerland, June, 1995, pp MERL-TR94-24MERL-TR Interfaces

How can we identify actions? MotionPose Held Objects Nearby Objects

Representing Motion Bobick Davis 2001 Optical Flow with Motion History

Representing Motion Efros et al Optical Flow with Split Channels

Representing Motion Tracked Points Matikainen et al. 2009

Representing Motion Space-Time Interest Points Corner detectors in space-time Laptev 2005

Representing Motion Space-Time Interest Points Laptev 2005

Representing Motion Space-Time Volumes Blank et al. 2005

Examples of Action Recognition Systems Feature-based classification Recognition using pose and objects

Action recognition as classification Retrieving actions in moviesRetrieving actions in movies, Laptev and Perez, 2007

Remember image categorization… Training Labels Training Images Classifier Training Training Image Features Trained Classifier

Remember image categorization… Training Labels Training Images Classifier Training Training Image Features Testing Test Image Trained Classifier Outdoor Prediction

Remember spatial pyramids…. Compute histogram in each spatial bin

Features for Classifying Actions 1.Spatio-temporal pyramids (14x14x8 bins) – Image Gradients – Optical Flow

Features for Classifying Actions 2.Spatio-temporal interest points Corner detectors in space-time Descriptors based on Gaussian derivative filters over x, y, time

Classification Boosted stubs for pyramids of optical flow, gradient Nearest neighbor for STIP

Searching the video for an action 1.Detect keyframes using a trained HOG detector in each frame 2.Classify detected keyframes as positive (e.g., “drinking”) or negative (“other”)

Accuracy in searching video Without keyframe detection With keyframe detection

Learning realistic human actions from movies, Laptev et al “Talk on phone” “Get out of car”

Approach Space-time interest point detectors Descriptors – HOG, HOF Pyramid histograms (3x3x2) SVMs with Chi-Squared Kernel Interest Points Spatio-Temporal Binning

Results

Action Recognition using Pose and Objects Modeling Mutual Context of Object and Human Pose in Human-Object Interaction ActivitiesModeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities, B. Yao and Li Fei-Fei, 2010 Slide Credit: Yao/Fei-Fei

Human-Object Interaction Torso Right-arm Left-arm Right-leg Left-leg Head Human pose estimation Holistic image based classification Integrated reasoning Slide Credit: Yao/Fei-Fei

Human-Object Interaction Tennis racket Human pose estimation Holistic image based classification Integrated reasoning Object detection Slide Credit: Yao/Fei-Fei

Human-Object Interaction Human pose estimation Holistic image based classification Integrated reasoning Object detection Torso Right-arm Left-arm Right-leg Left-leg Head Tennis racket HOI activity: Tennis Forehand Slide Credit: Yao/Fei-Fei Action categorization

Felzenszwalb & Huttenlocher, 2005 Ren et al, 2005 Ramanan, 2006 Ferrari et al, 2008 Yang & Mori, 2008 Andriluka et al, 2009 Eichner & Ferrari, 2009 Difficult part appearance Self-occlusion Image region looks like a body part Human pose estimation & Object detection Human pose estimation is challenging. Slide Credit: Yao/Fei-Fei

Human pose estimation & Object detection Human pose estimation is challenging. Felzenszwalb & Huttenlocher, 2005 Ren et al, 2005 Ramanan, 2006 Ferrari et al, 2008 Yang & Mori, 2008 Andriluka et al, 2009 Eichner & Ferrari, 2009 Slide Credit: Yao/Fei-Fei

Human pose estimation & Object detection Facilitate Given the object is detected. Slide Credit: Yao/Fei-Fei

Viola & Jones, 2001 Lampert et al, 2008 Divvala et al, 2009 Vedaldi et al, 2009 Small, low-resolution, partially occluded Image region similar to detection target Human pose estimation & Object detection Object detection is challenging Slide Credit: Yao/Fei-Fei

Human pose estimation & Object detection Object detection is challenging Viola & Jones, 2001 Lampert et al, 2008 Divvala et al, 2009 Vedaldi et al, 2009 Slide Credit: Yao/Fei-Fei

Human pose estimation & Object detection Facilitate Given the pose is estimated. Slide Credit: Yao/Fei-Fei

Human pose estimation & Object detection Mutual Context Slide Credit: Yao/Fei-Fei

H A Mutual Context Model Representation More than one H for each A ; Unobserved during training. A:A: Croquet shot Volleyball smash Tennis forehand Intra-class variations Activity Object Human pose Body parts l P : location; θ P : orientation; s P : scale. Croquet mallet Volleyball Tennis racket O:O: H:H: P:P: f:f: Shape context. [Belongie et al, 2002] P1P1 Image evidence fOfO f1f1 f2f2 fNfN O P2P2 PNPN Slide Credit: Yao/Fei-Fei

Activity Classification Results Cricket shot Tennis forehand Bag-of-words SIFT+SVM Gupta et al, 2009 Our model Slide Credit: Yao/Fei-Fei

More results Independent Joint

Take-home messages Action recognition is an open problem. – How to define actions? – How to infer them? – What are good visual cues? – How do we incorporate higher level reasoning?

Take-home messages Some work done, but it is just the beginning of exploring the problem. So far… – Actions are mainly categorical – Most approaches are classification using simple features (spatial-temporal histograms of gradients or flow, s-t interest points, SIFT in images) – Just a couple works on how to incorporate pose and objects – Not much idea of how to reason about long-term activities or to describe video sequences