Word sense disambiguation with pictures Kobus Barnard, Matthew Johnson presented by Milan Iliev.

Word sense disambiguation with pictures Kobus Barnard, Matthew Johnson presented by Milan Iliev

Overview ● Why disambiguate? ● Where disambiguate? ● How disambiguate?

Why disambiguate? Disambiguation is one of the biggest problems in natural language processing. And, natural language processing is very important.

Why disambiguate with pictures? ● Contextual disambiguation often does little better than choosing the most common sense ● Contextual disambiguation is sometimes difficult, even impossible: 'He eats river by the bank'

Where could picture disambiguation be used? ● In image-containing websites ● In vision-capable robots [catch the ball] ● In text supplemented with media, such as encyclopedias

Modes of disambiguation ● Image-only: standalone image disambiguation with no available contextual information. ● Image-enhanced textual disambiguation: available document text and/or additional text domain plus image data ● Fall back on textual-only disambiguation when no image data is available.

The Core: Image-based Word Prediction Algorithm A new method for predicting words from images. Based on a statistical model for joint probability distribution of expected words and image region features. Learning model: trained on images with associated text. Large sets of images. Caveat: large amounts of words and region feature fuzziness makes for large error rate.

The Core, continued However: When disambiguating, we can limit possible words to meanings of the word we are disambiguating. With small number of word choices, error rate is low. Also: Where to get all these images with associated words? Why, the Corel image database, of course.

Textual context disambiguation Assumption: the word to be disambiguated is semantically linked to other words in its context. Approach: statistical analysis of co-occurences. For example, 'flop' meaning 'fail' occurs often near the words 'attempt', 'disaster', 'genius', etc. 'flop' meaning 'floating point operations per second' occurs near 'gigahertz', 'PowerPC', 'transistor', etc.

Who's got all that data? WordNet. WordNet is a machine-readable dictionary with a large (152,000 words) portion of the English language organized into synsets, most commonly with a 'hypernym' relationship ( 'A is a B' ). Also, 'sense numbers' indicate which sense is most often useful – obviously, very useful for disambiguation.

Linking Images to Words

Images to Words: Criteria ● Size (pixel percentage over image) ● Position (region center of mass, relative to image) ● Color (average and std. dev. over each of R, G, and B) ● Texture (average and variance of filter responses. Gaussian filters are involved.) ● Shape (area/perimeter, center of mass/MOI, convex ratio) ● Color context (adjacent colors, 90 degrees)

The Formula The hidden variables are called 'concepts'. They generate both words and blobs. And so, multiple concepts exist on the image. P(word, blob) = { FrequencyTable(word, concept) * GaussianDistribution(blob, concept) * PriorP(concept) } for all concepts. Some independence is assumed.

Finally, The Point: Image Disambiguation Assumptions: Humans have a disambiguated vocabulary For every word w (like 'bank') in our 'normal' vocabulary W, there are a number of senses s1, s2, etc, in our 'disambiguated' vocabulary S (bank_1, bank_2, etc). We get a posterior probability P( s | w, B ) s – sense, w – word, B – image context On demand, combine with textual context P( s | w, W ): P( s | w, B, W ) = c * P( s | w, B) * P( s, | w, W )

And then, they trained: Building ImCor The Corel image DB was not very ambiguous The researchers built an image-to- passage linked database, much like illustrated news articles or websites. ● Modified SMUaw / SenseLearner textual algorithm for more softness/fluidity ● Add image data ● Ask humans to rate appropriateness of images to passages of text ● Mark similar passages with the same keywords

Results The performance tests indicated that for a small, friendly domain, pure image-based disambiguation exceed the performance of two text-based algorithms. Also, combined image-and-text disambiguation provided further improvement.

Word sense disambiguation with pictures Kobus Barnard, Matthew Johnson presented by Milan Iliev.

Similar presentations

Presentation on theme: "Word sense disambiguation with pictures Kobus Barnard, Matthew Johnson presented by Milan Iliev."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Word sense disambiguation with pictures Kobus Barnard, Matthew Johnson presented by Milan Iliev.

Similar presentations

Presentation on theme: "Word sense disambiguation with pictures Kobus Barnard, Matthew Johnson presented by Milan Iliev."— Presentation transcript:

Similar presentations

About project

Feedback