Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Florian Schroff, Antonio Criminisi & Andrew Zisserman ICCV 2007 Harvesting Image Databases from the Web.
Patch to the Future: Unsupervised Visual Prediction
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Multiclass object recognition
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Example 16,000 documents 100 topic Picked those with large p(w|z)
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – Keerti Korrapati
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Category Discovery from the Web slide credit Fei-Fei et. al.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Amy Dai Machine learning techniques for detecting topics in research papers.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.
2D-LDA: A statistical linear discriminant analysis for image matrix
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
System for Semi-automatic ontology construction
Using Transductive SVMs for Object Classification in Images
Object-Graphs for Context-Aware Category Discovery
Brief Review of Recognition + Context
Presented by Wanxue Dong
Demystifying Web Content Accessibility Guidelines
Unsupervised learning of visual sense models for Polysemous words
Sign Language Recognition With Unsupervised Feature Learning
Presentation transcript:

Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak

Polysemy Ambiguity of an individual word or phrase that can be used in different contexts to express two or more meanings Eg – Present: right now – Present: a gift Visual polysemy refers to different meanings which are visually distinct

Eg: Mouse

Unsupervised learning of object classifiers suffers because of Polysemy Existing approaches try to filter out unrelated images either through bootstrapping object classifier or clustering the images into coherent components. But they don’t take into consideration polysemy of words This paper proposes a unsupervised method which takes into account the word sense.

Idea behind the paper Input: List of words and their dictionary meaning Learn a text model of the word sense Use that model to retrieve images of specific sense Use these re-ranked images as training data for an object classifier This classifier can predict the correct sense off the word related to an image

Model Three main steps: – Discovering latent dimensions – Learning probabilistic models of dictionary sense – Using above sense models to construct sense- specific image classifiers

Latent Text Space

Create a dataset of text-only webpages returned from regular web search. LDA model is learnt on this dataset This is done because using words directly surrounding the images causes problem of overfitting.

Dictionary Sense Model Relate dictionary sense to topics formed in the previous step. Dictionary sense obtained from WordNet From the sense and the topics the likelihood of a particular sense given any topic is computed (P(s|z=j)) Using this the probability of a particular sense in a document is computed (P(s|d)) This gives us the sense for each image From this images can be grouped according to sense

Visual Sense Model Uses sense model obtained in the above two step to generate training data for an image- based classifier Use discriminative classifier (SVM) Re-ranks the images according the probability of that sense From the re-ranked images selects N highest ranked examples as positive training for SVM

Datasets Three datasets used: – Images for bass, face, mouse, speaker and watch from Yahoo Image Search – Returned images annotated as either unrelated, partial or good – Second dataset was collected using sense specific search terms – Third dataset was text-only dataset collected using regular web search

Features Words in webpages are taken with html tags removed, tokenized, stop words removed and stemmed For words related to images words surrounding the image link is extracted Window size of 100 words is considered for the above step. Image features are obtained by first resizing the images to 300 pixels converting it to grayscale and then extracting edge features and scale invariant salient points