Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.

Slides:



Advertisements
Similar presentations
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Advertisements

Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.
Alignment Visual Recognition “Straighten your paths” Isaiah.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
Introduction To Tracking
1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.
2D Human Pose Estimation in TV Shows Vittorio Ferrari Manuel Marin Andrew Zisserman Dagstuhl Seminar July 2008.
Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Hierarchical Saliency Detection School of Electronic Information Engineering Tianjin University 1 Wang Bingren.
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Broadcast Court-Net Sports Video Analysis Using Fast 3-D Camera Modeling Jungong Han Dirk Farin Peter H. N. IEEE CSVT 2008.
Kinect Case Study CSE P 576 Larry Zitnick
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Segmentation and Tracking of Multiple Humans in Crowded Environments Tao Zhao, Ram Nevatia, Bo Wu IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
Segmentation Divide the image into segments. Each segment:
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Segmentation by Clustering Reading: Chapter 14 (skip 14.5) Data reduction - obtain a compact representation for interesting image data in terms of a set.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.
Tracking Video Objects in Cluttered Background
Augmented Reality: Object Tracking and Active Appearance Model
Kapitel 7 “Tracking” – p. 1 Tracking  Fundamentals  Object representation  Object detection  Object tracking A. Yilmaz, O. Javed, and M. Shah Object.
Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition Oytun Akman.
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
A Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang BoHao Jiang Institute of Automation, CAS Boston College.
CS55 Tianfan Xue Adviser: Bo Zhang, Jianmin Li.
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.
1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Visual Tracking Decomposition Junseok Kwon* and Kyoung Mu lee Computer Vision Lab. Dept. of EECS Seoul National University, Korea Homepage:
A General Framework for Tracking Multiple People from a Moving Camera
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Learning the Appearance and Motion of People in Video Hedvig Sidenbladh, KTH Michael Black, Brown University.
Motion Analysis using Optical flow CIS750 Presentation Student: Wan Wang Prof: Longin Jan Latecki Spring 2003 CIS Dept of Temple.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Optical Flow. Distribution of apparent velocities of movement of brightness pattern in an image.
1 Motion Analysis using Optical flow CIS601 Longin Jan Latecki Fall 2003 CIS Dept of Temple University.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
IEEE International Conference on Multimedia and Expo.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Processing Images and Video for An Impressionist Effect Automatic production of “painterly” animations from video clips. Extending existing algorithms.
Motion tracking TEAM D, Project 11: Laura Gui - Timisoara Calin Garboni - Timisoara Peter Horvath - Szeged Peter Kovacs - Debrecen.
Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden Michael Black Brown University, RI, USA
Tracking Objects with Dynamics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Vehicle Segmentation and Tracking in the Presence of Occlusions
COMPUTER VISION Tam Lam
Detecting Artifacts and Textures in Wavelet Coded Images
Segmentation and Grouping
Globally Optimal Generalized Maximum Multi Clique Problem (GMMCP) using Python code for Pedestrian Object Tracking By Beni Mulyana.
Online Graph-Based Tracking
Tracking Many slides adapted from Kristen Grauman, Deva Ramanan.
Presentation transcript:

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman

Introduction Problem: to track the articulations of people from video sequence. –Need to determine both the number of people in each frame. –Estimate their configuration. Two stage automatic system –Build a model of appearance of each person in a video. –Track by detecting those models in each frame.

Approach Under our model, the focus on tracking becomes not so much identifying where an object is, but learning what it looks like. Bottom-up approach –Look for candidate body parts in each frame, then cluster the candidates to find assemblies of parts that might be people. Top-down approach –Look for entire person in a single frame. We assume people tend to occupy certain key poses, and so we build models from those poses that are easy to detect

Temporal Pictorial Structures First-order Markov model, we replicate the standard model T times, once for each frame:

Temporal Pictorial Structures

Building Models by Clustering An important observation is that we have some a priori notion of part appearance Ci as having rectangular edges. –Detect candidate parts in each frame with an edge-based part detector. –Cluster the resulting image patches to identify body parts that look similar across time. –Prune clusters that move too fast in some frames.

Building Models by Clustering

Detecting Parts with Edges In the experiments, we used a detector threshold that was manually set between 10 and 50 (assuming edge filters are L1-normalized and images are scaled to 255).

Clustering Image Patches Mean-shift method. We create a feature vector for each candidate segment, consisting of a 512-dimensional RGB color histogram(8 bins for each color axis). We scale the feature vector by empirically-determined value to yield a unit-variance model in (3).

Enforcing a Motion Model For each cluster, we want to find a sequence of candidates that obeys our bounded velocity motion model defined in (5). –We obtain a sequence of segments, at most one per frame, where the segments are within a fixed velocity bound of one another and where all lie close to the cluster center in appearance. –Prune the sequences that are too small or that never move.

Learning Multiple Appearance Models We use the learned appearance to build better segment detectors. We search for new candidates using the medoid image patch of the valid clusters from Fig. 5c as a template. Link up those candidates that obey our velocity constraints into the final torso track in Fig. 5d.

Learning Multiple Appearance Models

Approximate Inference If the torso localization (and estimated appearance) is poor, the resulting appearance and localization estimates for the limbs will suffer. One remedy might be to continually pass messages in Fig. 9 in a loopy fashion (e.g., reestimate the torso appearance given the arm appearance).

Building Models with Stylized Detectors

Detecting Lateral Walking Poses

Discriminative Appearance Models

Track by Model Detection Multiple scales –System searches over an image pyramid. It selects the largest scale at which a person was detected. Occlusion

Track by Model Detection Spatial Smoothing( better than direct MAP) –the smoothed pose tends to be stable since nearby poses also have high posterior values. –the smoothed pose contains “sub-pixel” accuracy since it is a local average. Temporal Smoothing –By feeding the pose posterior at each frame into a formal motion model. Multiple people Multiple instances

Results - Building Models by clustering Self-starting Multiple activities

Results - Building Models by clustering Lack of background subtraction

Results - Building Models by clustering Multiple people, recovery from occlusion and error (see Fig. 18.)

Results - Building Models with a Stylized Detector Lateral-walking pose detection Appearance model detection

Discussion Comparison of model-building algorithms. We find the two model-building algorithms complementary. If we can observe people for a long time, or if we expect them to behave predictably, detecting stylized poses is likely the better approach.