DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.

Slides:

Advertisements

Similar presentations

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Advertisements

Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

SUPER: Towards Real-time Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University Shanghai, China

1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,

Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.

Haojie Li Jinhui Tang Si Wu Yongdong Zhang Shouxun Lin Automatic Detection and Analysis of Player Action in Moving Background Sports Video Sequences IEEE.

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

IEEE TCSVT 2011 Wonjun Kim Chanho Jung Changick Kim

ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.

Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

ADVISE: Advanced Digital Video Information Segmentation Engine

A Study of Approaches for Object Recognition

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.

Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,

T.Sharon 1 Internet Resources Discovery (IRD) Video IR.

Region-Level Motion- Based Background Modeling and Subtraction Using MRFs Shih-Shinh Huang Li-Chen Fu Pei-Yung Hsiao 2007 IEEE.

Presented by Zeehasham Rasheed

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Face Recognition and Retrieval in Video Basic concept of Face Recog. & retrieval And their basic methods. C.S.E. Kwon Min Hyuk.

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Representative Previous Work

Computer vision.

Video Classification By: Maryam S. Mirian

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE

Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Searching and Browsing Video in Face Space Lee Begeja Zhu Liu Video and Multimedia Technologies Research.

Tactic Analysis in Football Instructors: Nima Najafzadeh Mahdi Oraei Spring

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Object Detection with Discriminatively Trained Part Based Models

Pedestrian Detection and Localization

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Hierarchical Matching with Side Information for Image Classification

First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Using Cross-Media Correlation for Scene Detection in Travel Videos.

Classifying Covert Photographs CVPR 2012 POSTER. Outline  Introduction  Combine Image Features and Attributes  Experiment  Conclusion.

Statistical techniques for video analysis and searching chapter Anton Korotygin.

SUMMERY 1. VOLUMETRIC FEATURES FOR EVENT DETECTION IN VIDEO correlate spatio-temporal shapes to video clips that have been automatically segmented we.

Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.

An Ontology framework for Knowledge-Assisted Semantic Video Analysis and Annotation Centre for Research and Technology Hellas/ Informatics and Telematics.

Visual Event Recognition in Videos by Learning from Web Data

Guillaume-Alexandre Bilodeau

Automatic Video Shot Detection from MPEG Bit Stream

Saliency-guided Video Classification via Adaptively weighted learning

Color-Texture Analysis for Content-Based Image Retrieval

Object detection as supervised classification

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Outline Announcement Texture modeling - continued Some remarks

2 variants: Global fusion & Local perturbation

Brief Review of Recognition + Context

Multimedia Information Retrieval

Knowledge-based event recognition from salient regions of activity

Presentation transcript:

DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

Outline 1. Introduction 2. Scene-Level Concept Score Feature 3. Single-Level Earth Mover’s Distance in The Temporal Domain 4. Temporally Aligned Pyramid Matching 5. Experiments 6. Contributions and Conclusion

1. Introduction Previous work on video event recognition can be roughly classified as either activity recognition or abnormal event recognition

Model-based Abnormal event recognition - Zhang et al. [1] propose a semisupervised adapted Hidden Markov Model (HMM) framework Activity recognition - HMM - coupled HMM - Dynamic Bayesian Network

Appearance-based Abnormal event recognition - Boiman and Irani [7] Activity recognition - Ke et al. [8] - Efros et al. [9] - Other

Event recognition in broadcast news video Rich information Emerging applications of open source intelligence Online video search

LSCOM ontology Large-Scale Concept Ontology for Multimedia Defined 56 event/activity concepts Manual annotation of such event concepts has been completed for a large data set in TRECVID 2005 [15]

Challenges of events in news video Large variations of scenes and activities Difficult to - reliably track moving objects - detect the salient spatiotemporal interest regions - extract the spatial-temporal features

Address the challenges of news video Ebadollahi et al. [17] midlevel Concept score (CS) nonparametric approach bag-of-words model

Bag-of-words model Represent one video clip as a bag of orderless features, extracted from all of the frames Earth Mover’s Distance (EMD) [21] Single-level EMD (SLEMD) Support Vector Machine (SVM) Temporally Aligned Pyramid Matching (TAPM)

2. Scene-Level Concept Score Feature Holistic features to represent content in constituent image frames Multilevel temporal alignment framework to match temporal characteristics of various events

Three low-level global feature Grid Color Moment Gabor Texture Edge Direction Histogram

We used because Efficiently extracted over the large video corpus Effective for detecting several concepts Suitable for capturing the characteristics of scenes

3. Single-Level Earth Mover’s Distance in The Temporal Domain One video clip P can be represented as a signature: m is the total number of frames, pi is the feature extracted from the ith frame, wpi is the weight of the ith frame, We also represent another video clip Q as a signature: n is the total number of frames

dij is the ground distance between pi and qj

SVM classification

4. Temporally Aligned Pyramid Matching Spatial Pyramid Matching (SPM) Pyramid Match Kernel (PMK) Temporally Constrained Hierarchical Agglomerative Clustering (T-HAC)

T-HAC

Alignment of Different Subclips Principle Component Analysis (PCA)

Integer-value-constrained EMD

Fusion of Information from Different Levels hl is the weight for level-l

TAPM

5. Experiments SLEMD algorithm with the simplistic detector that uses a single keyframe and multiple keyframes Multilevel TAPM with the SLEMD method Midlevel CS feature with three low-level features

Single-Level EMD versus Keyframe-Based Algorithm SLEMD algorithm, i.e., TAPM at level-0 Keyframe-based algorithm (KF-CS) Multiframe-based representation (MF-CS)

Multilevel Matching versus Single-Level EMD Level-0 (L0), level-1 (L1), level-2 (L2) Combination of L0 and L1 (L0+L1) - h0 = h1 = 1 Combination of L0, L1 and L2 (L0+L1+L2) - h0 = h1 = h2 = 1 Combination of L0, L1 and L2 (L0+L1+L2-d) - h0 = h1 = 1, h2 = 2

Sensitivity to Clustering Method and Boundary Precision

The Effect of Temporal Alignment

Algorithmic Complexity Analysis and Speedup

Concept Score Feature versus Low-Level Features

6. Contributions and Conclusion First systematic studies of diverse visual event recognition in the unconstrained broadcast news domain with clear performance improvements