Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye Kai Li Guo-Jun Qi Kien.

Slides:

Advertisements

Similar presentations

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Advertisements

Aggregating local image descriptors into compact codes

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.

Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.

Juergen Gall Action Recognition.

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.

1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.

A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU)

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Bag of Video-Words Video Representation

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.

Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Video Tracking Using Learned Hierarchical Features

CVPR Workshop on RTV4HCI 7/2/2004, Washington D.C. Gesture Recognition Using 3D Appearance and Motion Features Guangqi Ye, Jason J. Corso, Gregory D. Hager.

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A Static Hand Gesture Recognition Algorithm Using K- Mean Based Radial Basis Function Neural Network 作者 :Dipak Kumar Ghosh,

Raviteja Vemulapalli University of Maryland, College Park.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

MURI Annual Review, Vanderbilt, Sep 8 th, 2009 Heterogeneous Sensor Webs for Automated Target Recognition and Tracking in Urban Terrain (W911NF )

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.

First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.

Skeleton Based Action Recognition with Convolutional Neural Network

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

Learning video saliency from human gaze using candidate selection CVPR2013 Poster.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

Genetic Algorithms for clustering problem Pasi Fränti

Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Naifan Zhuang, Jun Ye, Kien A. Hua

Processing visual information for Computer Vision

Unsupervised Learning of Video Representations using LSTMs

Machine Learning Clustering: K-means Supervised Learning

Action-Grounded Push Affordance Bootstrapping of Unknown Objects

Supervised Time Series Pattern Discovery through Local Importance

Action Recognition in the Presence of One

Video Google: Text Retrieval Approach to Object Matching in Videos

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Real-time Large Scale Near-duplicate Web Video Retrieval

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.

Context-Aware Modeling and Recognition of Activities in Video

Raviteja Vemulapalli University of Maryland, College Park.

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Xiaodan Liang Sun Yat-Sen University

Human Action Recognition Week 8

Automatic Segmentation of Data Sequences

CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu.

Video Google: Text Retrieval Approach to Object Matching in Videos

Heterogeneous convolutional neural networks for visual recognition

09/05/2019 P-REACT Video Analytics.

Measuring the Similarity of Rhythmic Patterns

Week 7 Presentation Ngoc Ta Aidean Sharghi

Presentation transcript:

Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye Kai Li Guo-Jun Qi Kien A. Hua University of Central Florida

Outline Background Problem, existing methods, challenges Our algorithm Dynamic Temporal Quantization Multimodal Feature Fusion Performance study MSR-Action3D UTKinect-Action MSR-ActionPairs Conclusions

Background Depth sensors becomes affordable and popular New human-computer interaction Gesture recognition Speech recognition Application domain Video games, education, business, healthcare

Problem and Challenges Key problem: modeling the temporal dynamics of 3D human action/gestures Existing methods Histogram-based methods do not preserve order (bag-of-3d-words [5, 21], HOJ3D [16], HON4D [9] ) Temporal modeling suffer from video misalignment (motion template [7, 20], temporal pyramid [9, 14]) Challenge: temporal misalignment due to Temporal translation Execution rate variation

Dynamic Temporal Quantization Algorithm Objective Modeling the temporal patterns of 3D actions according to the transition of sub-actions satisfying Frames with similar postures are clustered together (sub-action constraint) Temporal order of the sequence must be preserved (order-preserving) Dynamic Temporal Quantization Algorithm

Dynamic Temporal Quantization Quantization: videos X1,X2,… Xn of varied length n quantized vector V1,V2,…Vm of fixed length m. Optimal frame assignment a Objective function: Optimal quantization can be obtained by jointly optimizing a and V

Dynamic Temporal Quantization (cont’d) Nontrivial to jointly solve the frame assignment a Initialization: uniform partition Aggregation step: given fixed assignment a, vj is computed by the aggregation Assignment step: fixed the quantized vector V, update the assignment a by DTW Iterate until convergence.

Hierarchical representation Multilayers of the Dynamic Quantization Top layers: global temporal patterns Bottom layers: local temporal patterns Concatenate all layers

Multimodel Feature Fusion Multimodal features: joint coordinate pairwise angle joint offset [21] histogram of velocity components (HVC) Supervised learning for all quantized vectors Multiclass SVM Fusion by regression (softmax)

Experiments Experiments on three public 3D human action datasets MSR-Action3D UTKinect-Action MSR-ActionPairs

Experiment: dynamic quantization VS deterministic quantization outperforms deterministic quantization. MSR-Action3D dataset Feature Accuracy Dynamic quantization Deterministic quantization position 81.61% 76.24% angle 73.95% 71.65% offset 68.20% velocity 80.84% 72.80% fused 90.42% 83.15% Similar performances can be observed in the other two datasets.

Experiment: hierarchical representation MSR-Action3D dataset with the joint coordinate feature Layers 1 2 3 4 5 Accuracy 66.28% 67.82% 71.26% 81.61% 77.39% More layers generally produce higher accuracy though need to take care of the overfitting.

Experiment: Comparison with state-of-the-art results Method Accuracy Actionlet Ensemble [14] HON4D [9] DCSF [15] Lie Group [13] Super Normal Vector [18] Proposed method 88.2% 88.89% 89.3% 89.48% 93.09% 90.42% Method Accuracy Actionlet Ensemble [14] HON4D [9] HON4D + Ddisc [9] Super Normal Vector [18] Proposed method 82.22% 93.33% 96.67% 98.89% 93.71% MSR-Action3D dataset MSR-ActionPairs dataset Method Accuracy Histogram of 3D joints [17] Combined features with random forest [21] Lie Group [13] Proposed method 90.92% 91.9% 97.08% 100% UTKinect-Action dataset (100% accuracy)

Conclusions A novel algorithm for 3D human action sequence recognition from the perspective of dynamic temporal quantization. Extensive experiments on three public datasets demonstrate the effectiveness of the proposed technique for temporal modeling.

Thank you. Questions?