Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Slides:



Advertisements
Similar presentations
Shape Context and Chamfer Matching in Cluttered Scenes
Advertisements

DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.
Chamfer Distance for Handshape Detection and Recognition CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Detection, Segmentation, and Pose Recognition of Hands in Images by Christopher Schwarz Thesis Chair: Dr. Niels da Vitoria Lobo.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi Object recognition.
International Conference on Automatic Face and Gesture Recognition, 2006 A Layered Deformable Model for Gait Analysis Haiping Lu, K.N. Plataniotis and.
Cambridge, Massachusetts Pose Estimation in Heavy Clutter using a Multi-Flash Camera Ming-Yu Liu, Oncel Tuzel, Ashok Veeraraghavan, Rama Chellappa, Amit.
Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.
Segmentation-Free, Area-Based Articulated Object Tracking Daniel Mohr, Gabriel Zachmann Clausthal University, Germany ISVC.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
IBBT – Ugent – Telin – IPI Dimitri Van Cauwelaert A study of the 2D - SIFT algorithm Dimitri Van Cauwelaert.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
Active Calibration of Cameras: Theory and Implementation Anup Basu Sung Huh CPSC 643 Individual Presentation II March 4 th,
Multi video camera calibration and synchronization.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
1 Face Tracking in Videos Gaurav Aggarwal, Ashok Veeraraghavan, Rama Chellappa.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
Chamfer Matching & Hausdorff Distance Presented by Ankur Datta Slides Courtesy Mark Bouts Arasanathan Thayananthan.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Cindy Song Sharena Paripatyadar. Use vision for HCI Determine steps necessary to incorporate vision in HCI applications Examine concerns & implications.
Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
Shape Recognition and Pose Estimation for Mobile Augmented Reality Author : N. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst Date : Speaker.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Landing a UAV on a Runway Using Image Registration Andrew Miller, Don Harper, Mubarak Shah University of Central Florida ICRA 2008.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Gesture Recognition CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Hands segmentation Pat Jangyodsuk. Motivation Alternative approach of finding hands Instead of finding bounding box, classify each pixel whether they’re.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Automated Construction of Parameterized Motions Lucas Kovar Michael Gleicher University of Wisconsin-Madison.
Template-Based Face Detection CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Tracking CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Multi-view Synchronization of Human Actions and Dynamic Scenes Emilie Dexter, Patrick Pérez, Ivan Laptev INRIA Rennes - Bretagne Atlantique
Tracking Hands with Distance Transforms Dave Bargeron Noah Snavely.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Nearest Neighbor Classifiers
Signal and Image Processing Lab
SIFT Scale-Invariant Feature Transform David Lowe
Lecture 13 Gesture Recognition – Part 2
Reflectance Function Approximation
Nearest-neighbor matching to feature database
Lecture 26 Hand Pose Estimation Using a Database of Hand Images
Video Google: Text Retrieval Approach to Object Matching in Videos
Color-Texture Analysis for Content-Based Image Retrieval
Detecting Shapes in Cluttered Images
Object recognition Prof. Graeme Bailey
Nearest-neighbor matching to feature database
Rob Fergus Computer Vision
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

2 Static Gestures (Hand Poses) Input image Articulated hand model Given a hand model, and a single image of a hand, estimate: –3D hand shape (joint angles). –3D hand orientation. Joints

3 Static Gestures Input image Articulated hand model Given a hand model, and a single image of a hand, estimate: –3D hand shape (joint angles). –3D hand orientation.

4 Goal: Hand Tracking Initialization Given the 3D hand pose in the previous frame, estimate it in the current frame. –Problem: no good way to automatically initialize a tracker. Rehg et al. (1995), Heap et al. (1996), Shimada et al. (2001), Wu et al. (2001), Stenger et al. (2001), Lu et al. (2003), …

5 Assumptions in Our Approach A few tens of distinct hand shapes. –All 3D orientations should be allowed. –Motivation: American Sign Language.

6 Assumptions in Our Approach A few tens of distinct hand shapes. –All 3D orientations should be allowed. –Motivation: American Sign Language. Input: single image, bounding box of hand.

7 Assumptions in Our Approach We do not assume precise segmentation! –No clean contour extracted. input image skin detection segmented hand

8 Approach: Database Search Over 100,000 computer-generated images. –Known hand pose. input

9 Why? We avoid direct estimation of 3D info. –With a database, we only match 2D to 2D. We can find all plausible estimates. –Hand pose is often ambiguous. input

10 Building the Database 26 hand shapes

images are generated for each hand shape. Total: 107,328 images. Building the Database

12 Features: Edge Pixels We use edge images. –Easy to extract. –Stable under illumination changes. inputedge image

13 Chamfer Distance input model Overlaying input and model How far apart are they?

14 Input: two sets of points. –red, green. c(red, green): –Average distance from each red point to nearest green point. Directed Chamfer Distance

15 Input: two sets of points. –red, green. c(red, green): –Average distance from each red point to nearest green point. c(green, red): –Average distance from each red point to nearest green point. Directed Chamfer Distance

16 Input: two sets of points. –red, green. c(red, green): –Average distance from each red point to nearest green point. c(green, red): –Average distance from each red point to nearest green point. Chamfer Distance Chamfer distance: C(red, green) = c(red, green) + c(green, red)

17 Evaluating Retrieval Accuracy A database image is a correct match for the input if: –the hand shapes are the same, –3D hand orientations differ by at most 30 degrees. input correct matches incorrect matches

18 Evaluating Retrieval Accuracy An input image has correct matches among the 107,328 database images. –Ground truth for input images is estimated by humans. input correct matches incorrect matches

19 Evaluating Retrieval Accuracy Retrieval accuracy measure: what is the rank of the highest ranking correct match? input correct matches incorrect matches

20 Evaluating Retrieval Accuracy input rank 1rank 2rank 3rank 5rank 6rank 4 … … highest ranking correct match

21 Results on 703 Real Hand Images Rank of highest ranking correct match Percentage of test images 115% % %

22 Results on 703 Real Hand Images Rank of highest ranking correct match Percentage of test images 115% % % Results are better on “nicer” images: –Dark background. –Frontal view. –For half the images, top match was correct.

23 Examples initial image segmented hand edge image correct match rank: 1

24 Examples initial image segmented hand edge image correct match rank: 644

25 Examples initial image segmented hand edge image incorrect match rank: 1

26 Examples initial image segmented hand edge image correct match rank: 1

27 Examples initial image segmented hand edge image correct match rank: 33

28 Examples initial image segmented hand edge image incorrect match rank: 1

29 Examples “hard” case segmented hand edge image “easy” case segmented hand edge image

30 Research Directions More accurate similarity measures. Better tolerance to segmentation errors. –Clutter. –Incorrect scale and translation. Verifying top matches. Registration.

31 Efficiency of the Chamfer Distance Computing chamfer distances is slow. –For images with d edge pixels, O(d log d) time. –Comparing input to entire database takes over 4 minutes. Must measure 107,328 distances. input model

32 The Nearest Neighbor Problem database

33 The Nearest Neighbor Problem query Goal: –find the k nearest neighbors of query q. database

34 The Nearest Neighbor Problem Goal: –find the k nearest neighbors of query q. Brute force time is linear to: –n (size of database). –time it takes to measure a single distance. query database

35 Goal: –find the k nearest neighbors of query q. Brute force time is linear to: –n (size of database). –time it takes to measure a single distance. The Nearest Neighbor Problem query database