Agenda.

Slides:



Advertisements
Similar presentations
2.3.3 MAD SAMBA (Multicamera and distributed Surveillance and multisensor-based surveillance) Contact: Alessandro ZANASI zanasi-alessandro.eu.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Matthias Wimmer, Bernd Radig, Michael Beetz Chair for Image Understanding Computer Science TU München, Germany A Person and Context.
Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria dell’Informazione ImageLab: Attività di ricerca e sviluppo in video-sorveglianza.
Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria dell’Informazione ImageLab: Attività di ricerca e sviluppo in video-sorveglianza.
People Counting and Human Detection in a Challenging Situation Ya-Li Hou and Grantham K. H. Pang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART.
Ziming Zhang, Yucheng Zhao and Yiwen Wan.  Introduction&Motivation  Problem Statement  Paper Summeries  Discussion and Conclusions.
Image Repairing: Robust Image Synthesis by Adaptive ND Tensor Voting IEEE Computer Society Conference on Computer Vision and Pattern Recognition Jiaya.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Patch to the Future: Unsupervised Visual Prediction
ICDSC 2007 – Vienna, Austria – Sept A Distributed Outdoor Video Surveillance System for Detection of Abnormal.
Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Object Inter-Camera Tracking with non- overlapping views: A new dynamic approach Trevor Montcalm Bubaker Boufama.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Department of Electrical Engineering and Computer Science The University.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Vigilant Real-time storage and intelligent retrieval of visual surveillance data Dr Graeme A. Jones.
Object Detection and Tracking Mike Knowles 11 th January 2005
Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor
A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications Lucia Maddalena and Alfredo Petrosino, Senior Member, IEEE.
Shadow Detection In Video Submitted by: Hisham Abu saleh.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Image Subtraction for Real Time Moving Object Extraction Shahbe Mat Desa, Qussay A. Salih, CGIV’04.
Tutorial: multicamera and distributed video surveillance Third ACM/IEEE International Conference on Distributed Smart Cameras.
Performance Evaluation of Grouping Algorithms Vida Movahedi Elder Lab - Centre for Vision Research York University Spring 2009.
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Multimedia Databases (MMDB)
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Mean-shift and its application for object tracking
BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.
Segmentation and classification of man-made maritime objects in TerraSAR-X images IEEE International Geoscience and Remote Sensing Symposium Vancouver,
COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.
Similarity measuress Laboratory of Image Analysis for Computer Vision and Multimedia Università di Modena e Reggio Emilia,
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
A General Framework for Tracking Multiple People from a Moving Camera
NATIONAL TECHNICAL UNIVERSITY OF ATHENS Image, Video And Multimedia Systems Laboratory Background
Video-Vigilance and Biometrics
Person detection, tracking and human body analysis in multi-camera scenarios Montse Pardàs (UPC) ACV, Bilkent University, MTA-SZTAKI, Technion-ML, University.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Tijana Janjusevic Multimedia and Vision Group, Queen Mary, University of London Clustering of Visual Data using Ant-inspired Methods Supervisor: Prof.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Crowd Analysis at Mass Transit Sites Prahlad Kilambi, Osama Masound, and Nikolaos Papanikolopoulos University of Minnesota Proceedings of IEEE ITSC 2006.
Expectation-Maximization (EM) Case Studies
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
SZTAKI DEVA in Remote Sensing, Pattern recognition and change detection In Remote sensing Distributed Events Analysis Research Group Computer.
Date of download: 7/8/2016 Copyright © 2016 SPIE. All rights reserved. A scalable platform for learning and evaluating a real-time vehicle detection system.
Detecting Moving Objects, Ghosts, and Shadows in Video Streams
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Introduction Multimedia initial focus
Presenter: Ibrahim A. Zedan
Gait Analysis for Human Identification (GAHI)
Color-Texture Analysis for Content-Based Image Retrieval
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.
Image Segmentation Techniques
Source: Pattern Recognition Vol. 38, May, 2005, pp
Related Work in Camera Network Tracking
Presentation transcript:

Agenda

Presentation of ImageLab Digital Library content-based retrieval Computer Vision for robotic automation Multimedia: video annotation Medical Imaging Video analysis for indoor/outdoor surveillance Off-line Video analysis for telemetry and forensics People and vehicle surveillance Imagelab-Softech Lab of Computer Vision, Pattern Recognition and Multimedia Dipartimento di Ingegneria dell’Informazione Università di Modena e Reggio Emilia Italy http://imagelab.ing.unimore.it

Imagelab: recent projects in surveillance European International Italian & Regional With Companies THIS Transport hubs intelligent surveillance EU JLS/CHIPS Project 2009-2010 VIDI-Video: STREP VI FP EU  (VISOR VideosSurveillance Online Repository) 2007-2009 BE SAFE NATO Science for Peace project 2007-2009 Detection of infiltrated objects for security 2006-2008 Australian Council Behave_Lib : Regione Emilia Romagna Tecnopolo Softech 2010-2013 LAICA Regione Emilia Romagna; 2005-2007 FREE_SURF MIUR PRIN Project 2006-2008 Building site surveillance: with Bridge-129 Italia 2009-2010 Stopped Vehicles with Digitek Srl 2007-2008 SmokeWave: with Bridge-129 Italia 2007-2010 Sakbot for Traffic Analysis with Traficon 2004-2006 Mobile surveillance with Sistemi Integrati 2007 Domotica per disabili: posture detection FCRM 2004-2005

AD-HOC: Appearance Driven Human tracking with Occlusion Handling

Key aspects Based on the SAKBOT system Appearance based tracking Background estimation and updating Shadow removal Appearance based tracking we aim at recovering a pixel based foreground mask, even during an occlusion Recovering of missing parts from the background subtraction Managing split and merge situations Occlusion detection and classification Classify the differences as real shape changes or occlusions

Example 1 (from ViSOR) In this simple example you can have a visual output of the tracking system. In the upper right image we see the visual objects extracted from BS, in the lower image we see appearance models and probability masks. Probability masks are colored from red to blue. Red means low probability, blue high probability. Finally in the upper left image we how pixels are assigned to different tracks. How can we do this…

Example 2 from PETS 2002 This test video is extracted from the 2002 pets workshop dataset. It’s extremely difficult for some reasons: … In this video we see that the system can handle difficult situations of multiple people occlusion in outdoor environment. The problem is challenging because of segmentation errors, flattened colours and reflections due to the shop window.

Example 3 Now if we take the previous example, and we apply the selective update, we can see that the table occlusion is handled correctly. Regions of the track classified as occluded are shown with dark colors on the probability mask. In those regions, and only on those, model update is not performed. Therefore, even on long-lasting occlusion, both bounding box and centroid of the tracks are correctly estimated.

Other experimental results Imagelab videos (available on ViSOR) PETS series The algorithm presented is much less computationally requiring than other popular tracking algorithms. It depends on frame size, number and dimensions of track, but it works around 10 fps on standard hardware.

Results on the PETS2006 dataset Working in real time at 10 fps!

Posture classification

Distributed surveillance with non overlapping field of view

Exploit the knowledge about the scene To avoid all-to-all matches, the tracking system can exploit the knowledge about the scene Preferential paths -> Pathnodes Border line / exit zones Physical constraints & Forbidden zones NVR Temporal constraints

Tracking with pathnode A possible path between Camera1 and Camera 4

Pathnodes lead particle diffusion

Results with PF and pathnodes Single camera tracking: Multicamera tracking Recall=90.27% Recall=84.16% Precision=88.64% Precision=80.00%

“VIP: Vision tool for comparing Images of People” Lantagne & al., Vision Interface 2003 Each extracted silhouette is segmented into significant region using the JSEG algorithm ( Y. Deng ,B.S. Manjunath: “Unsupervised segmentation of color-texture regions in images and video” ) Colour and texture descriptors are calculated for each region The colour descriptor is a modified version of the descriptor presented in Y. Deng et al.: “Efficient color representation for Image retrieval”. Basically an HSV histogram of the dominant colors. The texture descriptor is based on D.K.Park et al.: “Efficient Use of Local Edge Histogram Descriptor”. Essentially this descriptor characterizes the edge density inside a region according to different orientations ( 0°, 45°, 90° and 135° ) The similarity between two regions is the weighted sum of the two descriptor similarities:

To compare the regions inside two silhouette, a region matching scheme is used, involving a modified version of the IRM algorithm presented in J.Z. Wang et al, ”Simplicity: Semantics-sensitive integrated matching for picture libraries” . The IRM algorithm is simple and works as follows: 1) The first step is to calculate all of the similarities between all regions. 2) Similarities are sorted in decreasing order, the first one is selected, and areas of the respective pair of regions are compared. A weight, equal to the smallest percentage area between the two regions, is assigned to the similarity measure. 3) Then, the percentage area of the largest region is updated by removing the percentage area of the smallest region so that it can be matched again. The smallest region will not be matched anymore with any other region. 4) The process continues in decreasing order for all of the similarities. In the end the overall similarity between the two region sets is calculated as:

ViSOR: Video Surveillance Online Repository

The ViSOR video repository

Aims of ViSOR Gather and make freely available a repository of surveillance videos Store metadata annotations, both manually provided as ground-truth and automatically generated by video surveillance tools and systems Execute Online performance evaluation and comparison Create an open forum to exchange, compare and discuss problems and results on video surveillance

Different types of annotation Structural Annotation: video size, authors, keywords,… Base Annotation: ground-truth, with concepts referred to the whole video. Annotation tool: online! GT Annotation: ground-truth, with a frame level annotation; concepts can be referred to the whole video, to a frame interval or to a single frame. Annotation tool: Viper-GT (offline) Automatic Annotation: output of automatic systems shared by ViSOR users.

Video corpus set: the 14 categories

Outdoor multicamera Synchronized views

Surveillance of entrance door of a building About 10h!

Videos for smoke detection with GT

Videos for shadow detection Already used from many researcher working on shadow detection Some videos with GT A. Prati, I. Mikic, M.M. Trivedi, R. Cucchiara, "Detecting Moving Shadows: Algorithms and Evaluation" in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, n. 7, pp. 918-923, July, 2003

Some statistics We need videos and annotations!

SIMULTANEOUS HMM action SEGMENTATION AND Recognition Action recognition SIMULTANEOUS HMM action SEGMENTATION AND Recognition

Probabilistic Action Classification Classical approach: Given a set of training videos containing an atomic action each (manually labelled) Given a new video with a single action … find the most likely action Dataset: "Actions as Space-Time Shapes (ICCV '05)." M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri

Classical HMM Framework Definition of a feature set For each frame t, computation of the feature set Ot (observations) Given a set of training observations O={O1…OT} for each action, training of an HMM (k) for each action k Given a new set of observations O={O1…OT} Find the model (k) which maximise P(k|O)

A sample 17-dim feature set Computed on the extracted blob after the foreground segmentation and people tracking:

From the Rabiner tutorial

Online action Recognition Given a video with a sequence of actions Which is the current action? Frame by frame action classification (online – Action recognition) When does an action finish and the next one start? (offline – Action segmentation) R. Vezzani, M. Piccardi, R. Cucchiara, "An efficient Bayesian framework for on-line action recognition" in press on Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, November 7-11, 2009

Main problem of this approach I do not know when the action starts and when it finishes. Using all the observations, the first action only is recognized A possible solution: “brute force”. For each action, for each starting frame, for each ending frame, compute the model likelihood and select the maximum. UNFEASIBLE

Our approach Subsample of the starting frames (1 each 10) Adoption of recursive formulas Computation of the emission probability once for each model (Action) Current frame as Ending frame Maximum length of each action The computational complexity is compliant with real time requirements

Different length sequences Sequences with different starting frame have different length Unfair comparisons using the traditional HMM schema The output of each HMM is normalized using the sequence length and a term related to the mean duration of the considered action This allows to classify the current action and, at the same time, to perform an online action segmentation