Knowledge-based event recognition from salient regions of activity

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Robust Foreground Detection in Video Using Pixel Layers Kedar A. Patwardhan, Guillermoo Sapire, and Vassilios Morellas IEEE TRANSACTION ON PATTERN ANAYLSIS.
Lecture 12 Content-Based Image Retrieval
Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Broadcast Court-Net Sports Video Analysis Using Fast 3-D Camera Modeling Jungong Han Dirk Farin Peter H. N. IEEE CSVT 2008.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
T.Sharon 1 Internet Resources Discovery (IRD) Video IR.
Color-Based Retrieval of Facial Images Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video and Multimedia Lab. Dept. of Electrical.
Feature extraction: Corners and blobs
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Vigilant Real-time storage and intelligent retrieval of visual surveillance data Dr Graeme A. Jones.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Scale Invariant Feature Transform (SIFT)
Presented by Zeehasham Rasheed
Project IST_1999_ ARTISTE – An Integrated Art Analysis and Navigation Environment Review Meeting N.1: Paris, C2RMF, November 28, 2000 Workpackage.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Spatio-chromatic image content descriptors and their analysis using Extreme Value theory Vasileios Zografos and Reiner Lenz
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
A structured learning framework for content- based image indexing and visual Query (Joo-Hwee, Jesse S. Jin) Presentation By: Salman Ahmad (270279)
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Overview Introduction to local features
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Multimodal Information Analysis for Emotion Recognition
Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
M4 / September Integrating multimodal descriptions to index large video collections M4 meeting – Munich Nicolas Moënne-Loccoz, Bruno Janvier,
Methods for 3D Shape Matching and Retrieval
Multimedia Systems and Communication Research Multimedia Systems and Communication Research Department of Electrical and Computer Engineering Multimedia.
CS654: Digital Image Analysis
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Applying Deep Neural Network to Enhance EMPI Searching
Visual Information Retrieval
Visual Event Recognition in Videos by Learning from Web Data
Video Google: Text Retrieval Approach to Object Matching in Videos
Paper Presentation: Shape and Matching
Outline Announcement Texture modeling - continued Some remarks
Multimedia Information Retrieval
Video Google: Text Retrieval Approach to Object Matching in Videos
Recognition and Matching based on local invariant features
Presentation transcript:

Knowledge-based event recognition from salient regions of activity Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of Geneva Knowledge-based event recognition from salient regions of activity M4 – Meeting – January 2004 January 23 2003 / Nicolas.Moenne-Loccoz@cui.unige.ch

Outline Context Salient Regions of Activity (SRA) Learning the semantic of SRA Visual Event Query language Conclusion NML - CVML - UniGe

Context Retrieval of visual events based on user query Abstract representation of the visual content Query Language to express visual events Approach Region-based description of the content Classification of the regions Events queried as spatio-temporal constraints on the regions NML - CVML - UniGe

Overview Domain Knowledge Region extraction Classification Salient regions of activity Labelled regions Videos database Region extraction Classification User queries NML - CVML - UniGe

Salient regions of activity Regions of the image space Moving in the scene Having an homogenous colour distribution  Moving objects or meaningful parts of moving objects Extraction : From moving salient points By an adaptive mean-shift algorithm NML - CVML - UniGe

Salient points extraction Scale invariant interest points (Mikolajczyk, Schmid 2001) Extracted in the linear scale-space Local maxima of the scale normalized Harris function (image space) Local maxima of the scale normalized Laplacian (scale space) NML - CVML - UniGe

Salient points extraction Example : scale NML - CVML - UniGe

Salient points trajectories Trajectories used to : Find salient points moving in the scene Track salient points along the time Points matching using Local grayvalue invariants (Schmid) NML - CVML - UniGe

Salient points trajectories Mahalanobis distance : Set of matching points minimize Greedy Winner-Takes-All algorithm Set of points trajectories Moving salient points : NML - CVML - UniGe

Salient regions estimation Estimate characteristic regions of the moving salient points Mean-Shift algorithm : estimate the position Likelihood of pixels (RGB colour distribution) Ellipsoidal Epanechnikov Kernel NML - CVML - UniGe

Salient regions estimation Kernel adaptation step : estimate shape and size Algorithm : NML - CVML - UniGe

Salient regions representation Set of salient regions of activity represented by : Position Ellipsoid Colour distribution Set of salient points Salient regions tracking Regions are matched by a majority vote of their salient points NML - CVML - UniGe

Salient regions of activity NML - CVML - UniGe

Regions classification To obtain an abstract description : Map regions to a domain-specific basic vocabulary  Meetings : {Arm, Head, Body, Noise} SVM classifier : Set of 500 annotated salient regions of activity (~200 frames) NML - CVML - UniGe

Regions classification Confusion Matrix : Discussion : Noise class is ill-defined Good results explained by the limited number of classes Arm Head Body Noise 1.000 0.909 0.091 0.052 0.946 NML - CVML - UniGe

Visual event language To express visual events queries Spatio-temporal constraints on labelled regions (LR) To integrate domain Knowledge As specification of the layout (L) As set of basic events a formula of the language is a conjunctive form of : Temporal relations {after, just-after} between 2 LR Spatial relations {above, left} between 2 LR {in} between a LR and a L Identity relations {is} between 2 LR {is-a} between a LR and a label NML - CVML - UniGe

Knowledege - Meetings Scene layout : L = {SEATS, DOOR, BOARD} NML - CVML - UniGe

Knowledege - Meetings Basic events : {Meeting-participant, sitting, standing} Meeting-participant : actors LR constraints is-a(head, LR). Sitting : actor : LR constraints : Meeting-participant(LR), in(SEATS, LR). Standing : actor : LR ~in(SEATS, LR). NML - CVML - UniGe

Events queries Example of user queries : Sitting-down : actors LR1, LR2 constraints is(LR1, LR2), sitting(LR1), standing(LR2), just-after(LR1, LR2). Go-to-board : actors LR1, LR2 standing(LR1), ~in(Board, LR1), in(Board, LR2), just-after(LR2, LR1). NML - CVML - UniGe

Events queries - Results Discussion : Recall validate the retrieval capability False alarms occur because of the hard decision Precision Recall Sit-down 0.43 1.00 Stand-up 0.50 Go-to-board Enter 0.20 Leave 0.25 NML - CVML - UniGe

Conclusion Contributions Limitations Ongoing work Well-suited framework for constraint domains Generic representation of the visual content Paradigm to retrieve visual events from videos Limitations Cannot retrieve all visual events (e.g. emotion) Ongoing work Uncertainty handling and fuzziness Integration of other modalities (e.g. transcripts) NML - CVML - UniGe