Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316.

Similar presentations


Presentation on theme: "Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316."— Presentation transcript:

1 Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316

2 . Goal Learning models for visual object recognition from natural language descriptions alone Why learn model from natural language? Manually collecting and labeling large image sets is difficult New training set needs to be created for each new category Finding images for fined grained object categories is tough Ex- species of plants and animals But detailed visual descriptions may be readily available

3 . Outline Datasets for training and testing Natural Language Processing methods Template Filling Extraction of visual attributes from test images Score an image against the learnt template models Results Observations

4 . Dataset Text descriptions associated with ten species of butterflies from the eNature guide to construct the template model Butterflies, because they have distinctive visual features like wing colors, spots, etc Images downloaded from google for each of the ten butterfly categories form the testing set » Danaus plexippus Heliconius charitonius Heliconius erato Junonia coenia Lycaena phlaeas Nymphalis antiopa Papilio cresphontes Pieris rapae Vanessa atalanta Vanessa cardui

5 . Natural Language Processing Goal: Convert unstructured data in descriptions into structured templates Factual but unstructured data in text Information Extraction ……….. …….…. ………..

6 . Template Filling Text is tokenized into words Tokens are tagged with parts of speech (using C&C tagger) Custom transformations are performed to correct known mistakes Required because eNature guide tends to suppress some information Chunks of texts matching pre-defined tag sequence are extracted Ex- noun phrases (‘wings have blue spots’), adjective phrases (‘wings are black’) Extracted phrases are filtered through a list of colors, patterns and positions to fill the template slots Tokenization Part-of-Speech Tagging Custom Transformation ChunkingTemplate Filling

7 Visual Processing Performed based on two attributes of butterflies Dominant Wing Color Colored Spots 1) Image Segmentation Variation in the background can pose challenges during image classification Hence, the butterfly image was segmented from the background using the ‘star shape’ graph cut approach

8 2) Spot Detection (Using a spot classifier) Hand marked butterfly images with no prior class information form the training set for the spot classifier Candidate regions likely to be spots are extracted by using Difference-of-Gaussians interest point operator Image descriptors (SIFT features) are extracted around the candidate spot to classify it as a spot or non-spot 3) Color Modelling Required to connect color names of dominant wing colors and spot colors in learnt templates to image observations For each color name c i, probability distribution p(z|c i ) was learnt from training butterfly images,where z is a pixel color observation in the L*a*b* color space

9 Generative Model Given an input image I the probability of the image given a butterfly category Bi as a product over the spot and wing observations: Spot color name prior Equal priors to all spot colors Dominant color name prior

10 . Experimental Results Two set of experiments were performed Performance of human beings in recognizing butterflies from textual descriptions Because this may be reasonably considered as an upper bound Performance of the proposed method

11 Human Performance

12 Performance of proposed method

13 Observations Accuracy of proposed method was comparable to accuracy of non-native English speakers Accuracy of proposed method was more than 80 percent for four categories Classification of ‘Heliconius charitonius’ was the toughest for humans and also with the ground-truth and learnt templates Performance with ground-truth templates was comparable to that with the learnt templates Errors in templates due to NLP methods did not have much impact

14 Thank You


Download ppt "Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316."

Similar presentations


Ads by Google