By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.

Slides:



Advertisements
Similar presentations
V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Bayesian Decision Theory Case Studies
Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities M. S. Ryoo and J. K. Aggarwal ICCV2009.
Kien A. Hua Division of Computer Science University of Central Florida.
Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.
Actions in video Monday, April 25 Kristen Grauman UT-Austin.
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Probabilistic Group-Level Motion Analysis and Scenario Recognition Ming-Ching Chang, Nils Krahnstoever, Weina Ge ICCV2011.
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor :王聖智 教授 Student :周節.
Robust Object Tracking via Sparsity-based Collaborative Model
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Local Descriptors for Spatio-Temporal Recognition
Recent Developments in Human Motion Analysis
Overview of Computer Vision CS491E/791E. What is Computer Vision? Deals with the development of the theoretical and algorithmic basis by which useful.
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
Motivation Where is my W-2 Form?. Video-based Tracking Camera view of the desk Camera Overhead video camera.
Tracking Video Objects in Cluttered Background
1 Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Real-Time Decentralized Articulated Motion Analysis and Object Tracking From Videos Wei Qu, Member, IEEE, and Dan Schonfeld, Senior Member, IEEE.
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Ink and Gesture recognition techniques. Definitions Gesture – some type of body movement –a hand movement –Head movement, lips, eyes Depending on the.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Fingertip Tracking Based Active Contour for General HCI Application Proceedings of the First International Conference on Advanced Data and Information.
Reading Notes: Special Issue on Distributed Smart Cameras, Proceedings of the IEEE Mahmut Karakaya Graduate Student Electrical Engineering and Computer.
Zhengyou Zhang Microsoft Research Digital Object Identifier: /MMUL Publication Year: 2012, Page(s): Professor: Yih-Ran Sheu Student.
Motion Object Segmentation, Recognition and Tracking Huiqiong Chen; Yun Zhang; Derek Rivait Faculty of Computer Science Dalhousie University.
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
Multimedia Information Retrieval and Multimedia Data Mining Chengcui Zhang Assistant Professor Dept. of Computer and Information Science University of.
Human Gesture Recognition Using Kinect Camera Presented by Carolina Vettorazzo and Diego Santo Orasa Patsadu, Chakarida Nukoolkit and Bunthit Watanapa.
GESTURE ANALYSIS SHESHADRI M. (07MCMC02) JAGADEESHWAR CH. (07MCMC07) Under the guidance of Prof. Bapi Raju.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
DTI Management of Information LINK Project: ICONS Incident reCOgnitioN for surveillance and Security funded by DTI, EPSRC, Home Office (March March.
MACHINE VISION Machine Vision System Components ENT 273 Ms. HEMA C.R. Lecture 1.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
What is Multimedia Anyway? David Millard and Paul Lewis.
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
Bayesian Decision Theory Case Studies CS479/679 Pattern Recognition Dr. George Bebis.
Learning Patterns of Activity
3D Single Image Scene Reconstruction For Video Surveillance Systems
CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques
Dynamical Statistical Shape Priors for Level Set Based Tracking
Video-based human motion recognition using 3D mocap data
Human Activity Analysis
Region and Shape Extraction
Human-object interaction
Computer Vision Readings
Multiple Organ detection in CT Volumes Using Random Forests - Week 5
Presentation transcript:

By: Ryan Wendel

 It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines

 “HAA” stands for Human Activity Analysis  Surveillance systems  Patient monitoring systems  Human-computer interfaces

 We are going to take a look at methodologies that have been developed for simple human actions.  And high-level activities.

 Gestures  Actions  Interactions  Group activities

 Basic movements of a persons body parts.  For example:  Raising an arm  Lifting a leg

 A Single persons activities which could entail multiple gestures.  For example:  Walking  Waving  Shaking body

 Interactions that involve two or more people / items.  For Example:  Two people fighting

 Activities performed by multiple people.  For example:  A group running  A group walking  A group fighting

 Can be separated into two sections ◦ Single-layered approaches: An approach that deals with recognizing human activities based on a video feed (frame by frame.) ◦ Hierarchical approaches: An approach aimed at describing the high level approach to HAA by showing high level activities in simpler terms.

 Main objective is to analyze simple sequences of movements of humans  Can be categorized into two different categories ◦ Space-time approach: takes an input video as a 3- D volume ◦ Sequential approach: takes an input video and interprets it as a sequence of observations

 Divided into three different subsections based on features ◦ Space-time volume ◦ Space-time Trajectories ◦ Space-time features

 Captures a group of human activities by analyzing volumes of a video (frame by frame.)  Also uses types of recognition using space- time volumes to measure similarities between two volumes

 Uses stick figure modeling to extract joint positions of a person at each frame by frame

 Does not extract features frame by frame  Extracts features when there is a appearance or shape change in 3-D Space-time volume

 Space-Time Volume ◦ Hard to differentiate between multiple people in the same scene.  Space-Time Trajectories ◦ 3-D body-part detection and tracking is still an unsolved problem, and it requires a strong low- level component that can estimate 3-D join location.  Space-Time features ◦ Not suitable for modeling complex activities

 Divided into two different subsections based on features ◦ Exemplar-based ◦ State model-based

 Review ◦ Sequential approach: takes an input video and interprets it as a sequence of observations  Exemplar-based ◦ Shows human activities with a set of sample sequences of action executions

 Sequential set of sequences that represent a human activity as a model composed of a set of states.

 Exemplar-based is more flexible in terms of comparing multiple sample sequences  Where as State Model-based can handle a probabilistic analysis of an activity better.

 Sequential approach is able to handle and detect more complex activities performed  Whereas the Space-time approach handles simpler less complex activities.  Both methods are based off of some type of a sequences of images

 Allows the recognition of high-level activities based on the recognition results of other simpler activities  Advantages of the Hierarchical Approach ◦ Has the ability to recognize high-level activities with a more in depth structure ◦ Amount of data required to recognize an activity is significantly less then single-layered approach ◦ Easier to incorporate human knowledge

 Statistical approach  Syntactic approach  Description-based approach

 Statistical approaches use the state-based models to recognize activities  If you use multiple layers of a state-based model you can use these separate models to recognize activities with sequential structures

 Human activities are recognized as a string of symbols  Human activities are shown as a set of production rules generating a string of actions

 Human activities that use recognition with complex spatio-temporal structures ◦ A spatio-temporal structure is a detector used for recognizing human actions  Uses Context-free grammars (CFGs) to represent activities ◦ CFGs are used to recognize high-level activities ◦ The detection extracts space-time points and local periodic motions to obtain a sparse distribution of interest points in a video

 Probability theory  Fuzzy logic  Bayesian network: ◦ Used for recognition of an activity, based on the activities temporal structure representation ◦ Uses a large network with over 10,000 nodes

 A group of persons marching ◦ The images are recognized as an overall motion of an entire group  A group of people fighting ◦ Multiple videos are used to recognize the activity that a “group is fighting”

 Recognition of interactions between humans and objects requires multiple components involved.  A lot of human-object interaction ignores interaction between object recognition and motion estimation  You can also factor in object dependencies, motions, and human activities to determine activities involved

 J.K. Aggarwal and M.S. Ryoo Human activity analysis: A review. ACM Comput. Surv. 43, 3, Article 16 (April 2011), 43 pages. DOI= /   Christopher O. Jaynes Computer vision and artificial intelligence. Crossroads 3, 1 (September 1996), DOI= /  Zhu Li, Yun Fu, Thomas Huang, and Shuicheng Yan Real-time human action recognition by luminance field trajectory analysis. In Proceedings of the 16th ACM international conference on Multimedia (MM '08). ACM, New York, NY, USA, DOI= /  Paul Scovanner, Saad Ali, and Mubarak Shah A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th international conference on Multimedia (MULTIMEDIA '07). ACM, New York, NY, USA, DOI= /