Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities M. S. Ryoo and J. K. Aggarwal ICCV2009.

Slides:



Advertisements
Similar presentations
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Advertisements

Visual Event Recognition in Videos by Learning from Web Data Lixin Duan, Dong Xu, Ivor Tsang, Jiebo Luo ¶ Nanyang Technological University, Singapore ¶
Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie.
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
A Discriminative Key Pose Sequence Model for Recognizing Human Interactions Arash Vahdat, Bo Gao, Mani Ranjbar, and Greg Mori ICCV2011.
SoLSTiCe Similarity of locally structured data in computer vision Université-Jean Monnet (Saint-Etienne) LIRIS (Lyon) (1/02/ ) Elisa Fromont,
Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.
HMAX Models Architecture Jim Mutch March 31, 2010.
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Unsupervised Modelling, Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222)
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
2.Our Framework 2.1. Enforcing Temporal Consistency by Post Processing  Human Detection from Yang and Ramanan [1] Articulated Pose Estimation using Flexible.
Probabilistic Group-Level Motion Analysis and Scenario Recognition Ming-Ching Chang, Nils Krahnstoever, Weina Ge ICCV2011.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Adviser : Ming-Yuan Shieh Student ID : M Student : Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Computer and Robot Vision I
Local Descriptors for Spatio-Temporal Recognition
Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.
Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Spatio-Temporal Sequence Learning of Visual Place Cells for Robotic Navigation presented by Nguyen Vu Anh date: 20 th July, 2010 Nguyen Vu Anh, Alex Leng-Phuan.
Human Action Recognition
An Introduction to Action Recognition/Detection Sami Benzaid November 17, 2009.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Image Subtraction for Real Time Moving Object Extraction Shahbe Mat Desa, Qussay A. Salih, CGIV’04.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Multiclass object recognition
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Miguel Reyes 1,2, Gabriel Dominguez 2, Sergio Escalera 1,2 Computer Vision Center (CVC) 1, University of Barcelona (UB) 2
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CVPR Workshop on RTV4HCI 7/2/2004, Washington D.C. Gesture Recognition Using 3D Appearance and Motion Features Guangqi Ye, Jason J. Corso, Gregory D. Hager.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Week 9 Presented by Christina Peterson. Recognition Accuracies on UCF Sports data set Method Accuracy (%)DivingGolfingKickingLiftingRidingRunningSkating.
Mentor: Salman Khokhar Action Recognition in Crowds Week 7.
Pedestrian Detection and Localization
Raviteja Vemulapalli University of Maryland, College Park.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Interactive Learning of the Acoustic Properties of Objects by a Robot
First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
By Akhilesh K. Sinha Nishant Singh Supervised by Prof. Amitabha Mukerjee Video Surveillance of Basketball Matches and Goal Detection Indian Institute of.
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
A Hierarchical Deep Temporal Model for Group Activity Recognition
Data Driven Attributes for Action Detection
Exemplar-SVM for Action Recognition
Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.
IEEE ICIP Feature Normalization for Part-Based Image Classification
2 variants: Global fusion & Local perturbation
Kan Liu, Bingpeng Ma, Wei Zhang, Rui Huang
Human Activity Analysis
KFC: Keypoints, Features and Correspondences
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
University of Central Florida
Week 3 Volodymyr Bobyr.
Presentation transcript:

Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities M. S. Ryoo and J. K. Aggarwal ICCV2009

Introduction Human activity recognition, an automated detection of ongoing activities from video is an important problem. This technology can use on surveillance systems, robots, human-computer interface. When using on serveillance systems,automaically detect violent activities is very important.

Introduction Spatial-temporal feature-based approaches have been proposed by many researchers. The method above have benn successful on short video containing simple action such as walking and waving. In real-world applications, actions and activities are seldom like this.

Related works Methods focused on tracking persons and bodies are developed [4,11],but their results rely on background subtraction. Approaches that analyze a 3-S XYT volume gained particular in past few years[3,5,6,9,13,16], they extracted relationship on features and trained a model.

[3] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behaviorrecognition via sparse spatio- temporal features. In IEEEInternational Workshop on VS-PETS, pages 65–72, [4] S. Hongeng, R. Nevatia, and F. Bremond. Video-based eventrecognition: activity representation and probabilistic recognitionmethods. CVIU, 96(2):129–162, [5] H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologicallyinspired system for action recognition. In ICCV, [6] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld.Learning realistic human actions from movies. In CVPR,2008. [9] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. IJCV, 79(3), Sep [11] M. S. Ryoo and J. K. Aggarwal. Semantic representation and recognition of continued and recursive human activities. IJCV, 82(1):1–24, April [13] C. Schuldt, I. Laptev, and B. Caputo. Recognizing humanactions: a local svm approach. In ICPR, [16] S.-F. Wong, T.-K. Kim, and R. Cipolla. Learning motion categories using both semantic and structural information. In CVPR, 2007.

Related works In this paper, we propose a new spatial- temporal feature-based methodology. Kernel functions are built on relationship between features. After training features, match function uses for matching test data.

Example matching result

Spatio-temporal relationship match The method is based on matching two videos and output a real number for result. K : V x V R V -> input video, R-> result

Features and their relations A spatial-temporal feature extractor [3,14]detects each interest point locating a salient change.

Features and their relations f= (f des,f loc ) f des ->descriptor,f loc -> 3-D coordinate The features are clustered into k types using k-means on f des.

Features and their relations Each f loc have n elements, f 1 loc,…..f n loc. There are types to describe temporal relations:

Features and their relations Spatial relation are described below:

Features and their relations

Human activity recognition Our system maintains one training dataset D α per activity α. Let D α m extracted from mth training video in the set D α, then use the matching function.

Localization

Hierarchical recognition We can combine low-level action into high- level action. For instance, hand-shake includes two sub- action, arm streching and arm withdrawing. Detecting hand-shake may like : st1 before wd1,st2 before wd2, st1 equals st2,wd1 equals wd2.

Experiments The dataset is UT-interaction dataset. The actions are performed by actors, each video contains shake hands,point,hug,push,kick and punch.

Experiments

Conclusion This method rely on the extracted feature and spatial-temporal relationship on features. Can hierarchically detect high-level actions. Miss-detect on unusual feature combination.