Hand Gesture Recognition Using Hidden Markov Models

Slides:



Advertisements
Similar presentations
Loris Bazzani*, Marco Cristani*†, Alessandro Perina*, Michela Farenzena*, Vittorio Murino*† *Computer Science Department, University of Verona, Italy †Istituto.
Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Histograms of Oriented Gradients for Human Detection
Hand Gesture for Taking Self Portrait Shaowei Chu and Jiro Tanaka University of Tsukuba Japan 12th July 15 minutes talk.
Abstract Advanced gaming interfaces have generated renewed interest in hand gesture recognition as an ideal interface for human computer interaction. In.
Xin Zhang, Zhichao Ye, Lianwen Jin, Ziyong Feng, and Shaojie Xu
Multi-scenario Gesture Recognition Using Kinect Student : Sin- Jhu YE Student Id : MA Computer Engineering & Computer Science University of Louisville.
A Robust Method of Detecting Hand Gestures Using Depth Sensors Yan Wen, Chuanyan Hu, Guanghui Yu, Changbo Wang Haptic Audio Visual Environments and Games.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Object Inter-Camera Tracking with non- overlapping views: A new dynamic approach Trevor Montcalm Bubaker Boufama.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
A Study of Approaches for Object Recognition
MUltimo3-D: a Testbed for Multimodel 3-D PC Presenter: Yi Shi & Saul Rodriguez March 14, 2008.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Gesture Recognition Using Laser-Based Tracking System Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa Ishikawa Namiki Laboratory UNIVERSITY OF.
Professor : Yih-Ran Sheu Student’s name : Nguyen Van Binh Student ID: MA02B203 Kinect camera 1 Southern Taiwan University Department of Electrical Engineering.
Introduction Kinect for Xbox 360, referred to as Kinect, is developed by Microsoft, used in Xbox 360 video game console and Windows PCs peripheral equipment.
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Zhengyou Zhang Microsoft Research Digital Object Identifier: /MMUL Publication Year: 2012, Page(s): Professor: Yih-Ran Sheu Student.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Perception Introduction Pattern Recognition Image Formation
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
Fingerspelling Alphabet Recognition Using A Two-level Hidden Markov Model Shuang Lu, Joseph Picone and Seong G. Kong Institute for Signal and Information.
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
Fingerspelling Alphabet Recognition Using A Two-level Hidden Markov Model Shuang Lu, Joseph Picone and Seong G. Kong Institute for Signal and Information.
Pedestrian Detection and Localization
A New Fingertip Detection and Tracking Algorithm and Its Application on Writing-in-the-air System The th International Congress on Image and Signal.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Abstract Advanced gaming interfaces have generated renewed interest in hand gesture recognition as an ideal interface for human computer interaction.
Human pose recognition from depth image MS Research Cambridge.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Expectation-Maximization (EM) Case Studies
Histograms of Oriented Gradients for Human Detection(HOG)
Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.
Counting How Many Words You Read
CONTENT FOCUS FOCUS INTRODUCTION INTRODUCTION COMPONENTS COMPONENTS TYPES OF GESTURES TYPES OF GESTURES ADVANTAGES ADVANTAGES CHALLENGES CHALLENGES REFERENCE.
Product: Microsoft Kinect Team I Alex Styborski Brandon Sayre Brandon Rouhier Section 2B.
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Saleh Ud-din Ahmad Dr. Md. Shamim Akhter
Southern Taiwan University Department of Electrical Engineering
A. M. R. R. Bandara & L. Ranathunga
VIRTUAL INTELLIGENCE PROJECT NATAL (Kinect & Xbox 360)
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Gait Recognition Gökhan ŞENGÜL.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques
Real-Time Human Pose Recognition in Parts from Single Depth Image
Dynamical Statistical Shape Priors for Level Set Based Tracking
NBKeyboard: An Arm-based Word-gesture keyboard
Video-based human motion recognition using 3D mocap data
A Tutorial on HOG Human Detection
Range Imaging Through Triangulation
Calculate HOC on Depth and HOG on RGB and concatenate them
Image Segmentation Techniques
Higher School of Economics , Moscow, 2016
Multiple View Geometry for Robotics
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Wadner Joseph • James Haralambides, PhD Abstract
Higher School of Economics , Moscow, 2016
Learning complex visual concepts
Sign Language Recognition With Unsupervised Feature Learning
Presentation transcript:

Hand Gesture Recognition Using Hidden Markov Models Shuang Lu, Amir Harati and Joseph Picone Institute for Signal and Information Processing Temple University Philadelphia, Pennsylvania, USA

Abstract Advanced gaming interfaces have generated renewed interest in hand gesture recognition as an ideal interface for human computer interaction. In this talk, we will discuss a specific application of gesture recognition - fingerspelling in American Sign Language. Signer-independent (SI) fingerspelling alphabet recognition is a very challenging task due to a number of factors including the large number of similar gestures, hand orientation and cluttered background. We propose a novel framework that uses a two-level hidden Markov model (HMM) that can recognize each gesture as a sequence of sub-units and performs integrated segmentation and recognition. We present results on signer-dependent (SD) and signer-independent (SI) tasks for the ASL Fingerspelling Dataset: error rates of 2.0% and 46.8% respectively. Nonlinear Stats March 25, 2009 The Sequel May 9, 2012 Part III – Image May 29, 2013

Gesture Recognition… In The Movies… Emphasized the use of wireless data gloves to better localize hand positions. Manipulated data on a 2D screen. Relatively simple inventory of gestures focused on window manipulations. Integrated gestures and speech recognition to provide a very natural dialog. Introduced 3D visualization and object manipulation. Virtual reality-style CAD.

Gesture Recognition: Improved Sensors Microsoft Kinect Motion sensing input device for Xbox (and Windows in 2014) 8-bit RGB camera 11-bit infrared depth sensor (infrared laser projector and CMOS sensor) Multi-array microphone Frame rate of 9 to 30 Hz Resolution from 640x480 to 1280x1024 RGB Camera 3D Depth Sensors Motorized Tilt Microphone Array Depth images are useful for separating the image from the background Wireframe modeling and other on-board signal processing provides high quality image tracking

Gesture Recognition: American Sign Language (ASL) Primary mode of communication for over 500,000 people in North America alone. Approximately 6,000 words with unique signs. Additional words are spelled using fingerspelling of alphabet signs. In a typical communication, 10% to 15% of the words are signed by fingerspelling of alphabet signs. Similar to written English, the one- handed Latin alphabet in ASL consists of 26 hand gestures. The objective of our work is to classify 24 ASL alphabet signs from a static 2D image (we exclude “J” and “Z” because they are dynamic hand gestures).

Gesture Recognition: ASL still a challenging problem Similar shapes (e.g., “r” vs. “u”) Separation of hand from background Hand and background are similar in color Hand and arm are similar in color Background occurs within a hand shape Left-handed vs. right-handed Rotation, magnification, perspective, lighting, complex backgrounds, skin color, … Signer independent (SI) vs. signer dependent (SD)

Architecture: Two-Level Hidden Markov Model

Architecture: Histogram of Oriented Gradient (HOG) Gradient intensity and orientation: In every window, separate A(x, y) (from 0 to 2π) into 9 regions; sum all G(x, y) within the same region. Normalize features inside each block: Benefits: Illumination invariance due to the normalization of the gradient of the intensity values within a window. Emphasizes edges by the use of an intensity gradient calculation. Less sensitive to background details because the features use a distribution rather than a spatially-organized signal.

Architecture: Two-Levels of Hidden Markov Models

Experiments: ASL Fingerspelling Corpus 24 static gestures (excluding letters ”J” and ”Z”) 5 subsets from 4 subjects More than 500 images per sign per subject A total of 60,000 images Similar gestures Different image sizes Face occlusion Changes in illumination Variations in signers Sign rotation

Experiments: Parameter Tuning Performance as a function of the frame/window size Performance as a function of the number of mixture components Frame (N) Window (M) % Overlap Error (%) 5 20 75% 7.1% 30 83% 4.4% 10 50% 5.1% 67% 5.0% 60 8.0% No. Mixtures Error Rate (%) 1 9.9 2 6.8 4 4.4 8 2.9 16 2.0 System Parameter Value Frame Size (pixels) 5 Window Size (pixels) 30 No. HOG Bins 9 No. Sub-gesture Segments 11 No. States Per Sub-gesture Model 21 No. States Long Background (LB) No. States Short Background (SB) 1 No. Gaussian Mixtures (SG models) 16 No. Gaussian Mixtures (LB/SB models) 32 An overview of the optimal system parameters Parameters were sequentially optimized and then jointly varied to test for optimality Optimal settings are a function of the amount of data and magnification of the image

Experiments: SD vs. SI Recognition Performance is relatively constant as a function of the cross- validation set Greater variation as a function of the subject SD performance is significantly better than SI performance. “Shared” is a closed-subject test where 50% of the data is used for training and the other 50% is used for testing. HMM performance doesn’t improve dramatically with depth. System SD Shared SI Pugeault (Color Only) N/A 27.0% 65.0% Pugeault (Color + Depth) 25.0% 53.0% HMM (Color Only) 2.0% 7.8% 46.8%

Experiments: Error Analysis Gestures with a high confusion error rate. Images with significant variations in background and hand rotation. “SB” model is not reliably detecting background. Solution: transcribed data?

Analysis: ASL Fingerspelling Corpus Region of interest Recognition result

Summary and Future Directions A two-level HMM‑based ASL fingerspelling alphabet recognition system that trains gesture and background noise models automatically: Five essential parameters were tuned by cross-validation. Our best system configuration achieved a 2.0% error rate on an SD task, and a 46.8% error rate on an SI task. Currently developing new architectures that perform improved segmentation. Both supervised and unsupervised methods will be employed. We expect performance to be significantly better on the SI task. All scripts, models, and data related to these experiments are available from our project web site: http://www.isip.piconepress.com/projects/asl_fs.

Brief Bibliography of Related Research [1] Lu, S., & Picone, J. (2013). Fingerspelling Gesture Recognition Using Two Level Hidden Markov Model. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition. Las Vegas, USA. (Download). [2] Pugeault, N. & Bowden, R. (2011). Spelling It Out: Real-time ASL Fingerspelling Recognition. Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 1114–1119). (available at http://info.ee.surrey.ac.uk/Personal/N.Pugeault/index.php?section=FingerS pellingDataset). [3] Vieriu, R., Goras, B. & Goras, L. (2011). On HMM Static Hand Gesture Recognition. Proceedings of International Symposium on Signals, Circuits and Systems (pp. 1–4). Iasi, Romania. [4] Kaaniche, M. & Bremond, F. (2009). Tracking HOG Descriptors for Gesture Recognition. Proceedings of the Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance (pp. 140–145). Genova, Italy. [5] Wachs, J. P., Kölsch, M., Stern, H., & Edan, Y. (2011). Vision-based Hand- gesture Applications. Communications of the ACM, 54(2), 60–71.