Human Activity Analysis

Slides:

Advertisements

Similar presentations

Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities M. S. Ryoo and J. K. Aggarwal ICCV2009.

Advertisements

Antón R. Escobedo cse 252c Behavior Recognition via Sparse Spatio-Temporal Features Piotr Dollár Vincent Rabaud Garrison CottrellSerge Belongie.

Ziming Zhang, Yucheng Zhao and Yiwen Wan.  Introduction&Motivation  Problem Statement  Paper Summeries  Discussion and Conclusions.

Kien A. Hua Division of Computer Science University of Central Florida.

Machine recognition of human activities : a survey

By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.

Machine learning continued Image source:

Actions in video Monday, April 25 Kristen Grauman UT-Austin.

An Overview of Machine Learning

Patch to the Future: Unsupervised Visual Prediction

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.

Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.

Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.

Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.

Robust Object Tracking via Sparsity-based Collaborative Model

Computer and Robot Vision I

Local Descriptors for Spatio-Temporal Recognition

Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.

Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Recent Developments in Human Motion Analysis

A Study of Approaches for Object Recognition

Processing Digital Images. Filtering Analysis –Recognition Transmission.

Human Action Recognition

Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor

Introduction to Object Tracking Presented by Youyou Wang CS643 Texas A&M University.

An Introduction to Action Recognition/Detection Sami Benzaid November 17, 2009.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Information Retrieval in Practice

Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.

Bag of Video-Words Video Representation

Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.

Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.

Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Pedestrian Detection and Localization

Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Rick Parent - CIS681 Motion Analysis – Human Figure Processing video to extract information of objects Motion tracking Pose reconstruction Motion and subject.

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR

Computer vision: models, learning and inference

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

Data Driven Attributes for Action Detection

Saliency-guided Video Classification via Adaptively weighted learning

Paper Presentation: Shape and Matching

Real-Time Human Pose Recognition in Parts from Single Depth Image

Video-based human motion recognition using 3D mocap data

Object detection as supervised classification

Context-Aware Modeling and Recognition of Activities in Video

Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.

Brief Review of Recognition + Context

HCI/ComS 575X: Computational Perception

Week 3 Volodymyr Bobyr.

Presentation transcript:

Human Activity Analysis 吴心筱 wuxinxiao@bit.edu.cn Based on the CVPR 2011 Tutorial: Human Activity Analysis. http://cvrc.ece.utexas.edu/mryoo/cvpr2011tutorial/

Introduction

Understanding People in Video Goal Find the people Infer their poses Recognize what they do

Level of People Understanding Object-Level Understanding Locations of persons and objects Tracking-Level Understanding Object trajectories --correspondence

Level of People Understanding Pose-Level Understanding Human body parts Activity-Level Understanding Recognition of human activities and events

Object Detection Pedestrian (i.e. human) detection Detect all the persons in the video 找个行人检测的video

Object Tracking Person tracking Tracking the person in every frame 找个行人跟踪的video

Pose Estimation Human Pose Joint locations or angles of a person measured per frame Video as a sequence of poses 找个行人检测的video

Human Activity Recognition A collection of human/object movements with a particular semantic meaning Activity Recognition Finding of video segments containing such movements 找个行人检测的video

Levels of Human Activities Categorized based on their complexity Hierarchy # of participants Gestures Actions Interactions Group Activities 找个行人检测的video

Gestures Atomic components Single body-part movements 找个行人检测的video

Actions Single actor movement Composed of multiple gestures organized temporally 找个行人检测的video

Interactions Human-human interaction Human-object interaction 增加:Human-scene interaction

Group Activities Multiple persons or objects 找个行人检测的video

Applications 找个行人检测的video

Surveillance Monitor suspicious activities for real-time reactions. (e.g.,‘Fighting’, ‘stealing’) Currently, surveillance systems are mainly for recording. Activity recognition is essential for surveillance and other monitoring systems in public places 找个行人检测的video

Intelligent Environments (HCI) Intelligent home, office, and workspace Monitoring of elderly people and children. Recognition of ongoing activities and understanding of current context is essential. 找个行人检测的video

Sports Play Analysis 找个行人检测的video

Web-based video retrieval YouTube 20 hours of videos uploaded every minute Content-based search Search based on contents of the video, instead of user-attached keywords Example: search ‘kiss’ from long movies 找个行人检测的video

Challenges 找个行人检测的video

Robustness Environment variations Background Moving backgrounds Pedestrians Occlusions View-points – moving camera 找个行人检测的video

Motion Style Each person has his/her own style of executing an activity Who stretches his hand first? How long does one stay his hand stretched? 找个行人检测的video

Various Activities There are various types of activities The ultimate goal is to make computer recognize all of them reliably. 找个行人检测的video

Learning Insufficient amount of training videos Traditional setting: Supervised learning Human efforts are expensive! Unsupervised learning Interactive learning Interactive learning?

Overview 找个行人检测的video

Activity Classification Simple task of identifying videos Classify given videos into their types. Known, limited number of classes Assumes that each video contains a single activity 找个行人检测的video

找个行人检测的video

Activity Detection Search for the particular time interval <starting time, ending time> Video segment containing the activity 找个行人检测的video

Activity Detection by Classification Binary classifier Yes, Puch ! Sliding window technique Classify all possible time intervals 找个行人检测的video

Recognition Process Represent videos in terms of features Captures properties of activity videos Classify activities by comparing video representations Decision boundary 找个行人检测的video

Approaches 找个行人检测的video

Approach based taxonomy 找个行人检测的video

Single-layered vs. hierarchical Single-layered approaches Hierarchical approaches

Single-layered approaches 找个行人检测的video

Two different types Space-time approaches (data-orientated) Activities as video observations 3D space-time volume (3D XYT volume) A set of features extracted from the volume 找个行人检测的video

Two different types Sequential approaches (semantic-oriented) Activities as human movements A sequence of particular observations(feature vectors) 找个行人检测的video

Space-time approaches Space-time volumes Space-time local features Space-time trajectories 找个行人检测的video

Space-time volumes Problem: matching between two volumes 找个行人检测的video

Motion history images Bobick and J. Davis, 2001 Motion history images (MHIs) Weighted projection of a XYT foreground volume Template matching Bobick and J. Davis, The recognition of human movement using temporal templates,IEEE T PAMI 23(3),2001

Segments Ke, Suktankar, Herbert 2007 Volume matching based on its segments. Segment matching scores are combined. [ Ke, Y., Sukthankar, R., and Hebert, M., Spatio-temporal shape and flow correlation for action recognition. CVPR 2007]

Space-time local features Local descriptors / interest points From 2D to 3D ; Sparse Low-level: which local features to be extracted Mid-level: How to represent the activity using local features High-level: What method to classify activities 找个行人检测的video

Cuboid Dollar et al., Cuboid, VS-PETS 2005 2D Gaussian smoothing kernel: 1D Gabor filters applied temporally: 加入公式 [Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio-temporal features, VS-PETS 2005]

Cuboid Dollar et al., Cuboid, VS-PETS 2005 Appearances of local 3-D XYT volumes Raw appearance Gradients Optical flows Captures salient periodic motion. 加入公式 [Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio-temporal features, VS-PETS 2005]

STIP interest points Laptev and Linderberg 2003 Introduced the KTH dataset Local descriptor based on Harris corner detector Simple periodic actions 加入公式 [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004]

STIP interest points Laptev and Linderberg 2003 加入公式 [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004]

Bag-of-words Representation 找个行人检测的video

SVMs classification SVMs classifier Shake hands ! Multiple kernel learning 找个行人检测的video …… [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004]

pLSA models pLSA from text recognition Probabilistic latent semantic analysis Reasoning the probability of features originated from a particular action video. 加入公式 [Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words, BMVC 2006]

pLSA models 加入公式 [Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words, BMVC 2006]

Space-time trajectories Trajectory patterns Yilmaz and Shah, 2005 – UCF Joint trajectories in 3-D XYT space. [[Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005]

Space-time trajectories Trajectory patterns Yilmaz and Shah, 2005 – UCF Compared trajectory shapes to classify human actions. [[Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005]

Space-time approaches Summary Space-time volumes A straightforward solution Difficult in handling speed and motion variations 找个行人检测的video

Space-time approaches Summary Space-time local features Robust to noise and illumination changes Recognize multiple activities without background subtraction or body-part modeling Difficult to model more complex activities 找个行人检测的video

Space-time approaches Summary Space-time trajectories Perform detailed-level analysis View-invariant in most cases Difficult to extract the trajectories 找个行人检测的video

Two different types Sequential approaches (semantic-oriented) Activities as human movements A sequence of particular observations(feature vectors) 找个行人检测的video

Sequential approaches Exemplar-based approaches State model-based approaches

Exemplar-based approaches Matching between the input sequence of feature vectors and the template sequences Problem: how to match two sequences in different styles/different rates

Dynamic Time warping Match two sequences with variations Find an optimal nonlinear match 找个行人检测的video [[Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005]

State model-based approaches A human activity as a model composed of a set of states Each class has a corresponding model Measure the likelihood between the model an the input image sequence HMMs 找个行人检测的video

HMMs Given observations V (a sequence of poses), find the HMM Mi that maximizes P(V|Mi). Transition probabilities aij and observations probabilities bik are pre-trained using training data. 找个行人检测的video

HMMs for Actions Each hidden state is trained to generate a particular body posture. Each HMM produces a pose sequence: action 找个行人检测的video

HMMs for Hand Gestures HMMs for gesture recognition American Sign Language (ASL) Sequential HMMs shapes and position of hands 找个行人检测的video [Starner, T. and Pentland, A., Real-time American Sign Language recognition from video using hidden Markov models. International Symposium on Computer Vision, 1995.]

Sequential approaches summary Designed for modeling sequential dynamics Markov process Motion features are extracted per frame Limitations Feature extraction Assumes good observation models Complex human activities? Large amount of training data 找个行人检测的video

exemplar-based vs. state model-based Provide more flexibility for the recognition system: multiple sample sequences Less training data State model-based make a probabilistic analysis of the activity model more complex activity 找个行人检测的video

Hierarchical approaches 找个行人检测的video

Hierarchy Hierarchy implies decomposition into subparts 找个行人检测的video

Hierarchical approaches Statistical approaches Syntactic approaches Description-based approaches 找个行人检测的video

Statistical approaches Statistical state-based models for recognition multiple layers of state-based models low-level: a sequence of feature vectors mid-level: a sequence of atomic action labels high-level: label of activity 找个行人检测的video

Layered hidden Markov models (LHMMs) Oliver et al. 2002 Bottom layer HMMs recognize atomic actions of a single person Upper layer HMMs treat recognized atomic actions as observations …… 找个行人检测的video [Oliver, N., Horvitz, E., and Garg, A. 2002. Layered representations for human activity recognition. In Proceedings of the IEEE International Conference on Multimodal Interfaces (ICMI). IEEE, Los Alamitos, CA, 3-8.

Statistical approaches summary Reliably recognize activities in the case of noisy inputs with enough training data Inability to recognize activities with complex temporal structures (e.g., concurrent subevents) subevent A Subevent B 找个行人检测的video subevent A subevent A subevent B subevent B

Syntactic approaches Model activity as a string of symbols, each symbol corresponds to an atomic-level action Parsing techniques from the field of programming languages Context-free grammars (CFG) and stochastic context-free grammars (SCFGs)

CFG for human activities Given the recognized atomic actions Production rules

Parse tree 找个行人检测的video

Gesture analysis with CFGs 找个行人检测的video

Primitive recognition with HMMs

Parse Tree

Syntactic approaches summary Difficult in the recognition of concurrent activities Require a set of production rules fro all possible events Directions: how to learn grammar rules from observations automatically? 找个行人检测的video

Description-based approaches Represents a high-level human activity in terms of simpler activities Describe various relationships between subevents (atomic actions) temporal relationship spatial relationship logical relationship

Description-based approaches Recognize activities using semantic matching. Hand shake = “two persons do shake-action (stretches, stays stretched, withdraw) simultaneously, while touching”. Recognition by finding observations satisfying the definition.

Hierarchical approaches summary