Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Object Recognition on Mobile Platforms Using Mixture of Global Image Features
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science National Chengchi University, Taipei, TAIWAN August 30, 2012

Object Recognition ‘Scene’ Recognition
?

Outline Objective and system architecture Global vs. local descriptors
Proposed global image descriptors Weighted gist feature Average Effective Number of Neighbors Experimental results Taiwan landmark Oxford dataset Conclusion

Objective Design a computationally efficient framework for automatic scene recognition on mobile platforms Constraints: limited resources (CPU, communication bandwidth, storage)

System architecture Network Back-end Front-end Identification
Image Feature Extraction Feature Compression Front-end Query Data Matching Post-processing Feature Database Image Database Network Back-end

Global vs. local descriptors
Global features: provide a succinct description of the scene structure faster to compute Examples: histogram, gist. Local features: Characterize distinct points/parts in an image Robust features require substantial computation resources Examples: SIFT, SURF or HoG

The proposed global features
Weighted gist descriptor: based on the ‘gist’ descriptor, but weighted by saliency measure of the image region. Average effective number of neighbors (AENN): try to capture the overall structure of the scene based on the distribution of edge pixels.

The original gist feature
Computed by convolving an oriented filter with the image at several different orientations and scales. The scores for the filter convolution at each orientation and scale are stored in an array, which is the gist feature for that image.

Illustration 4x4 blocks, 6 orientations and 5 frequencies (Oliva and Torralba, 2001)

Saliency Map Graph-based visual saliency, J. Harel et al

Weighted gist descriptor

Effective Number of Neighbors (ENN)
(L):8, (R):8 Effective number of neighbors: (L):8 (R):4/2+4/4=3 5x5 window

Computing Average ENN Step 1: Edge detection
Step 2: Keep top q percent of the edge pixels Step 3: Compute ENN using a DxD neighborhood Step 4: a Average the ENN in an image block to form the feature vector

Average ENN descriptor: some examples

Parameter Settings (I)
For weighted gist partition the image into 4x4 blocks. use 8 orientation channels at two different frequencies and 4 orientation channels at another frequency, totaling 20 coefficients for each block feature dimension=4x4x20=320 for each channel For color images: 960 dimension

Parameter Settings (II)
For AENN Keep top 10% strong edges Partition into 8x8 image blocks Neighborhood size for computing ENN: 7x7

Classifier Support vector machine (SVM) with probability estimate output. Information fusion using linear combination

Experiment I: Taiwan Landmarks
Data collection - A total of 9530 images from 50 landmarks have been gathered. On average, each landmark contains 190 images. - Test dataset - Randomly select 20 images from each category (20x50=1000) - Training dataset - The remaining 8530 images

Experiment I: Results

Comparison of individual and hybrid approaches

Experiment II: Oxford buildings dataset
Publicly available dataset with ground truth Contains 5062 high resolution images 11 different categories Image quality: - Good - OK - Bad - Junk

Recognition rate using Oxford dataset
Retain only Good and OK images Query data: Randomly select 3 images from each category Training data - The remaining as training data

Other approaches using Oxford dataset(1/3)
Local configuration of SIFT-like features by a shape context - Image feature: PCA-SIFT local feature - Utilize shape context to describe local configuration The parameters are: scale of shape context in pixel, number of segmentation in scale/angle

Other approaches using Oxford dataset (2/3)
Improving bag-of-features for large scale image search - Using Hessian-Affine detector and SIFT - Employ bag-of-feature (BOF) framework to search for approximate nearest neighbor - Methods tested include original BOF, Hamming embedding(HE), weak geometric consistency constraints(WGC) and multiple assignment(MA)

Other approaches using Oxford dataset(3/3)

Conclusion Proposed a framework for scene recognition on mobile platforms Formulated two global image descriptors for recognition tasks Experiment results and comparative analysis have demonstrated the efficacy of our proposed strategy.

Thank you Q & A

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Similar presentations

Presentation on theme: "Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Similar presentations

Presentation on theme: "Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback