Knowledge-based event recognition from salient regions of activity

Knowledge-based event recognition from salient regions of activity
Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of Geneva Knowledge-based event recognition from salient regions of activity M4 – Meeting – January 2004 January /

Outline Context Salient Regions of Activity (SRA)
Learning the semantic of SRA Visual Event Query language Conclusion NML - CVML - UniGe

Context Retrieval of visual events based on user query
Abstract representation of the visual content Query Language to express visual events Approach Region-based description of the content Classification of the regions Events queried as spatio-temporal constraints on the regions NML - CVML - UniGe

Overview Domain Knowledge Region extraction Classification
Salient regions of activity Labelled regions Videos database Region extraction Classification User queries NML - CVML - UniGe

Salient regions of activity
Regions of the image space Moving in the scene Having an homogenous colour distribution  Moving objects or meaningful parts of moving objects Extraction : From moving salient points By an adaptive mean-shift algorithm NML - CVML - UniGe

Salient points extraction
Scale invariant interest points (Mikolajczyk, Schmid 2001) Extracted in the linear scale-space Local maxima of the scale normalized Harris function (image space) Local maxima of the scale normalized Laplacian (scale space) NML - CVML - UniGe

Salient points extraction
Example : scale NML - CVML - UniGe

Salient points trajectories
Trajectories used to : Find salient points moving in the scene Track salient points along the time Points matching using Local grayvalue invariants (Schmid) NML - CVML - UniGe

Salient points trajectories
Mahalanobis distance : Set of matching points minimize Greedy Winner-Takes-All algorithm Set of points trajectories Moving salient points : NML - CVML - UniGe

Salient regions estimation
Estimate characteristic regions of the moving salient points Mean-Shift algorithm : estimate the position Likelihood of pixels (RGB colour distribution) Ellipsoidal Epanechnikov Kernel NML - CVML - UniGe

Salient regions estimation
Kernel adaptation step : estimate shape and size Algorithm : NML - CVML - UniGe

Salient regions representation
Set of salient regions of activity represented by : Position Ellipsoid Colour distribution Set of salient points Salient regions tracking Regions are matched by a majority vote of their salient points NML - CVML - UniGe

Salient regions of activity
NML - CVML - UniGe

Regions classification
To obtain an abstract description : Map regions to a domain-specific basic vocabulary  Meetings : {Arm, Head, Body, Noise} SVM classifier : Set of 500 annotated salient regions of activity (~200 frames) NML - CVML - UniGe

Regions classification
Confusion Matrix : Discussion : Noise class is ill-defined Good results explained by the limited number of classes Arm Head Body Noise 1.000 0.909 0.091 0.052 0.946 NML - CVML - UniGe

Visual event language To express visual events queries
Spatio-temporal constraints on labelled regions (LR) To integrate domain Knowledge As specification of the layout (L) As set of basic events a formula of the language is a conjunctive form of : Temporal relations {after, just-after} between 2 LR Spatial relations {above, left} between 2 LR {in} between a LR and a L Identity relations {is} between 2 LR {is-a} between a LR and a label NML - CVML - UniGe

Knowledege - Meetings Scene layout : L = {SEATS, DOOR, BOARD}
NML - CVML - UniGe

Knowledege - Meetings Basic events : {Meeting-participant, sitting, standing} Meeting-participant : actors LR constraints is-a(head, LR). Sitting : actor : LR constraints : Meeting-participant(LR), in(SEATS, LR). Standing : actor : LR ~in(SEATS, LR). NML - CVML - UniGe

Events queries Example of user queries :
Sitting-down : actors LR1, LR2 constraints is(LR1, LR2), sitting(LR1), standing(LR2), just-after(LR1, LR2). Go-to-board : actors LR1, LR2 standing(LR1), ~in(Board, LR1), in(Board, LR2), just-after(LR2, LR1). NML - CVML - UniGe

Events queries - Results
Discussion : Recall validate the retrieval capability False alarms occur because of the hard decision Precision Recall Sit-down 0.43 1.00 Stand-up 0.50 Go-to-board Enter 0.20 Leave 0.25 NML - CVML - UniGe

Conclusion Contributions Limitations Ongoing work
Well-suited framework for constraint domains Generic representation of the visual content Paradigm to retrieve visual events from videos Limitations Cannot retrieve all visual events (e.g. emotion) Ongoing work Uncertainty handling and fuzziness Integration of other modalities (e.g. transcripts) NML - CVML - UniGe

Knowledge-based event recognition from salient regions of activity

Similar presentations

Presentation on theme: "Knowledge-based event recognition from salient regions of activity"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Knowledge-based event recognition from salient regions of activity

Similar presentations

Presentation on theme: "Knowledge-based event recognition from salient regions of activity"— Presentation transcript:

Similar presentations

About project

Feedback