Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yansong Feng and Mirella Lapata

Similar presentations

Presentation on theme: "Yansong Feng and Mirella Lapata"— Presentation transcript:

1 Yansong Feng and Mirella Lapata
How Many Words is a Picture Worth? Automatic Caption Generation for News Images Yansong Feng and Mirella Lapata Ashish Bagate

2 What this paper is about
Explore the feasibility of automatic caption generation for images in news domain Why particularly news domain – training data is available easily and abundantly

3 Why Lots of digital images available on the Web Improved searching
Analysis of the image Keywords only searches are ambiguous Targeted queries using longer search strings Web accessibility

4 General Approach Two step process
Analyze the image and build a representation for the same Run the text generation engine on the image representation, and come up with a natural language representation

5 Related Work Hede et al. – not practical because of controlled data set and also manual database creation Yao et al. – based on just the image Elzer et al. – what the graphic depicts, little emphasis on graphics generation These methods use some background information /terminologies

6 Problem Formulation For the given image I and the document D, generate a caption C Training data contains document – image – caption tuples Caption generation is a difficult task even for humans A good caption must be succinct, informative, clearly identify the subject of the picture, draw reader to the article

7 Overview of the method Similar to Headline generation task
Get the training data (it would be noisy) Follows two stage approach Get the keywords from the image (image annotation model) Generate the caption from the given image words Use of image features for faithful and meaningful description for the images

8 Image Annotation Probabilistic model – well suited for noisy data
Calculate SIFT descriptors of images Visual words by K means clustering Get the keywords by LDA dmix - bag of words representing image – document – caption

9 Extractive Caption Generation
Not much linguistic analysis is needed Caption would be a sentence from the document which is maximally similar to description keywords

10 Types of Similarities Word Overlap Cosine Similarity
Probabilistic Similarity KL divergence – similarity between an image and a sentence is measured by the extent to which they share the same topic distributions

11 Issues with Extractive Caption Generation
No single sentence can represent the image Selected caption sentences might be longer than the average length of the sentence May not be catchy

12 Abstractive Caption Generation
Word based model Adapted from headline generation Caption = the sequence of words that maximizes P

13 Abstractive Caption Generation
Phrase based model Caption = the sequence of words that maximizes P

14 Evaluation…

15 Evaluation…

16 Evaluation

17 Thanks!

Download ppt "Yansong Feng and Mirella Lapata"

Similar presentations

Ads by Google