Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Joint Face Alignment The Recognition Pipeline
Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie.
3D Model Matching with Viewpoint-Invariant Patches(VIP) Reporter :鄒嘉恆 Date : 10/06/2009.
Presented by Xinyu Chang
RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
Outline Feature Extraction and Matching (for Larger Motion)
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor :王聖智 教授 Student :周節.
Robust Object Tracking via Sparsity-based Collaborative Model
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
Local Descriptors for Spatio-Temporal Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
1 Interest Operators Find “interesting” pieces of the image –e.g. corners, salient regions –Focus attention of algorithms –Speed up computation Many possible.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
1 Interest Operator Lectures lecture topics –Interest points 1 (Linda) interest points, descriptors, Harris corners, correlation matching –Interest points.
Distinctive Image Feature from Scale-Invariant KeyPoints
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point D. Lowe, IJCV 2004 Presenting – Anat Kaspi.
Scale Invariant Feature Transform (SIFT)
Tracking Video Objects in Cluttered Background
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
1 Interest Operators Find “interesting” pieces of the image Multiple possible uses –image matching stereo pairs tracking in videos creating panoramas –object.
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
1 Mean shift and feature selection ECE 738 course project Zhaozheng Yin Spring 2005 Note: Figures and ideas are copyrighted by original authors.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Professor: S. J. Wang Student : Y. S. Wang
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Computer Vision Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations.
CS 8690: Computer Vision Ye Duan. CS8690 Computer Vision University of Missouri at Columbia Instructor Ye Duan (209 Engr West)
Pedestrian Detection and Localization
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
Features, Feature descriptors, Matching Jana Kosecka George Mason University.
Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.
Visual homing using PCA-SIFT
Guillaume-Alexandre Bilodeau
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Video Google: Text Retrieval Approach to Object Matching in Videos
Features Readings All is Vanity, by C. Allan Gilbert,
CSE 455 – Guest Lectures 3 lectures Contact Interest points 1
Kan Liu, Bingpeng Ma, Wei Zhang, Rui Huang
Brief Review of Recognition + Context
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Creating Data Representations
SIFT keypoint detection
Knowledge-based event recognition from salient regions of activity
CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu.
Video Google: Text Retrieval Approach to Object Matching in Videos
Presented by Xu Miao April 20, 2005
Learning complex visual concepts
Presentation transcript:

Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova

2 Outline of the presentation  3D object recognition View based approaches Local descriptors for object recognition Our approach  Spatio temporal models for 3D objects recognition Modeling sequences Representation of video sequences w.r.t. the model 2-stage matching procedure  Recognizing objects: experiments and results

3 Object recognition  Localisation means to determine the pose of each object relative to a sensor  Categorization means recognising the class to which an object belongs instead of recognising that particular object  The goal of recognition systems is to identify which objects are present in a scene Unlike ”merely” perceiving a shape, recognising it involves memory, that is accessing at representations of shapes seen in the past [Wittgenstein73]

4 Object recognition

5 View based object recognition  View based approaches to 3D object recognition gained attention as a way to deal with appearance variation [Murase et al. 95, Pontil et al. 98] no explicit model is required  Local approaches produce relatively compact descriptions of the image content and do not suffer from the presence of cluttered background and occlusions [Mikolajczyk et al. 03 ]  Local object models are often inspired by text categorization [Cristianini et al. 02]  Many view based local approach to recognition have been proposed [Leibe et al. 04, Csurka et al. 04]

6 Our approach to recognition  We observe an object from slightly different viewpoints and exploit local features distinctive in space and stable in time to perform recognition  Our approach shares some similarities with codebook methods but our method extends this concept also in the temporal domain

7 Our approach to recognition  View-based recognition systems do not need explicit computation of 3D object models  Local approaches produce compact descriptions and do not suffer from cluttered background and occlusions  Spatial constraints improve quality of recognition [Ferrari et al. 06]  Biological vision systems gather information by means of motion to include important cues for depth perception and object recognition [Stringer et al.06]

8 Outline of the presentation  3D object recognition View based approaches Local descriptors for object recognition Our approach  Spatio temporal models for 3D objects recognition Modeling sequences Representation of video sequences w.r.t. the model 2-stage matching procedure  Recognizing objects: experiments and results

9 Drawbacks of locality His eyes would dart from one thing to another, picking up tiny features, individual features, as they had done with my face. A striking brightness, a colour, a shape would arrest his attention and elicit comment – but in no case did he get the scene-as-a-whole. He failed to see the whole, seeing only details, which he spotted like blips on a radar screen. He never entered into relation with the picture as a whole - never faced, so to speak, its physiognomy. He had no sense whatever of a landscape or a scene. ”The Man Who Mistook His Wife For A Hat: And Other Clinical Tales”, by Oliver Sacks, 1970

10 Ideas  Obtain a 3D object recognition method based on a compact description of image sequences  Exploit spatial information on proximity of features appearing contemporaneously  Exploit temporal continuity both on training and test E. Delponte, N. Noceti, F. Odone and A. Verri Spatio temporal constraints for matching view-based descriptions of 3D objects In WIAMIS 2007

11 Recognizing objects with ST models Video sequences Keypoints detection and description Keypoints tracking Cleaning procedure Building the spatio temporal model 2-stage matching procedure Object recognition Spatio-temporal model for training Spatio-temporal model for test

12 From sequence to spatio-temporal model

13  For each image of the sequence extract Harris corners assign them a scale and a principal direction assign them a SIFT descriptor  Tracking of keypoints with Kalman filter cleaning procedure based on length of trajectories and robustness of descriptors  Computation of time invariant features From sequence to spatio-temporal model Video sequences Keypoints detection and description Keypoints tracking Cleaning procedure Building the spatio temporal model

14 From sequence to spatio-temporal model

15 Time invariant feature  We obtain a set of time-invariant features: a spatial appearance descriptor, that is the average of all SIFT vectors of its trajectory a temporal descriptor, that contains information on when the feature first appeared in the sequence and on when it was last observed

16 The spatio-temporal model  The collection of time- invariant features constitutes a spatio- temporal model that we use to train our system  We emphasise the temporal coherence of the model and we exploit features appearing simultaneously

17 Matching spatio-temporal models 2-stage matching procedure Object recognition Spatio-temporal model for training Spatio-temporal model for test

18 Matching of sequence models  For each video sequence we compute its spatio-temporal model  Given a test sequence, we perform a two stage matching procedure by exploiting spatial and temporal coherence of time- invariant features we compute a first set of matches we reinforce the procedure by analising spatial and temporal matches neighborhood

19 Matching of sequence models

20 Outline of the presentation  3D object recognition View based approaches Local descriptors for object recognition Our approach  Spatio temporal models for 3D objects recognition Modeling sequences Representation of video sequences w.r.t. the model 2-stage matching procedure  Recognizing objects: experiments and results

21 Experiments and results  Matching assessment Illumination, scale and background changes Changes in motion Increasing the number of objects  Object recognition on a 20 objects dataset  Recognition on a video streaming

22 3D objects

23 Matching assessment Matches obtained on sequences with simple changes compared w.r.t. ST models of 4 objects

24 Changing motion Matches obtained w.r.t. ST models of 4 objects Matches obtained in the first step of matching

25 Models of 3D objects Test sequences Matching assessment Models of 3D objects Test sequences

26 Recognizing 20 objects Book:  Bambi: + + Dewey: О О X Dewey:  Sully: X X Box: О О Donald:  Dewey:++ Scrooge:О О Book:  Bambi: + + Dewey: О О X Dewey:  Sully: X X Box: О О Donald:  Dewey:++ Scrooge:О О Book:  Bambi: + + Dewey: О О X Dewey:  Sully: X X Box: О О Donald:  Dewey:++ Scrooge:О О Book:  Bambi: + + Dewey: О О X Dewey:  Sully: X X Box: О О Donald:  Dewey:++ Scrooge:О О Number of experiments: 840 TP=51FN=13 FP=11TN=765

27 Recognition on a video stream

28 Conclusion and future work  We exploited the compactness and the expressiveness of local image descriptions to address the problem of 3D object recognition  We devised a system based on the use of spatial and temporal information and we have proved how the model of a 3D object benefit of both these information  The system could benefit from adding information on the image context [Tor03]

Thanks for your attention!