BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Shape Sharing for Object Segmentation
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Capturing Human Insight for Visual Learning Kristen Grauman Department of Computer Science University of Texas at Austin Work with Sudheendra Vijayanarasimhan,
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
Patch to the Future: Unsupervised Visual Prediction
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.
Proceedings of the IEEE 2010 Antonio Torralba, MIT Jenny Yuen, MIT Bryan C. Russell, MIT.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Recognition: A machine learning approach
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Presented by Zeehasham Rasheed
Canonical Correlation Analysis: An overview with application to learning methods By David R. Hardoon, Sandor Szedmak, John Shawe-Taylor School of Electronics.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Efficient Image Search and Retrieval using Compact Binary Codes
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin CVPR 2010.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Image Annotation and Feature Extraction
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
AUTOMATIC ANNOTATION OF GEO-INFORMATION IN PANORAMIC STREET VIEW BY IMAGE RETRIEVAL Ming Chen, Yueting Zhuang, Fei Wu College of Computer Science, Zhejiang.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
WhittleSearch: Image Search with Relative Attribute Feedback CVPR 2012 Adriana Kovashka Devi Parikh Kristen Grauman University of Texas at Austin Toyota.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.
Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.
Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Cross-modal Hashing Through Ranking Subspace Learning
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Visual Information Retrieval
Learning a Region-based Scene Segmentation Model
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Correlative Multi-Label Multi-Instance Image Annotation
Project Implementation for ITCS4122
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Accounting for the relative importance of objects in image retrieval
Object-Graphs for Context-Aware Category Discovery
CS 1674: Intro to Computer Vision Scene Recognition
Rob Fergus Computer Vision
Ying Dai Faculty of software and information science,
Progressive Cross-media Correlation Learning
Presentation transcript:

BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin

Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search International Journal of Computer Vision, 2011 Sung Ju Hwang and Kristen Grauman

Relative importance of objects An image can contain many different objects, but some are more “important” than others. sky water mountain architecture bird cow

Relative importance of objects Some objects are background sky water mountain architecture bird cow

Relative importance of objects Some objects are less salient sky water mountain architecture bird cow

Relative importance of objects Some objects are more prominent or perceptually define the scene sky water mountain architecture bird cow

Our goal Goal: Retrieve those images that share important objects with the query image. versus How to learn a representation that accounts for this?

TAGS: Cow Birds Architecture Water Sky Idea: image tags as importance cue The order in which person assigns tags provides implicit cues about object importance to scene.

Approach overview: Building the image database Extract visual and tag-based features Cow Grass Horse Grass Car House Grass Sky Learn projections from each feature space into common “semantic space” Tagged training images …

Cow Tree Retrieved tag-list Image-to-image retrieval Image-to-tag auto annotation Tag-to-image retrieval Approach overview: Retrieval from the database Untagged query image Cow Tree Grass Tag list query Image database Retrieved images

Visual features captures the HSV color distribution captures the total scene structure captures local appearance (k-means on DoG+SIFT) Color HistogramVisual Words Gist [Torralba et al.]

Tag features Traditional bag-of-(text)words Word Frequency Cow Bird Water Architecture Mountain Sky tagcount Cow1 Bird1 Water1 Architecture1 Mountain1 Sky1 Car0 Person0

Tag features Absolute Rank Cow Bird Water Architecture Mountain Sky Absolute rank in this image’s tag-list tagvalue Cow1 Bird0.63 Water0.50 Architecture0.43 Mountain0.39 Sky0.36 Car0 Person0

Tag features Relative Rank Cow Bird Water Architecture Mountain Sky Percentile rank obtained from the rank distribution of that word in all tag-lists. tagvalue Cow0.9 Bird0.6 Water0.8 Architecture0.5 Mountain0.8 Sky0.8 Car0 Person0

Learning mappings to semantic space Canonical Correlation Analysis (CCA): choose projection directions that maximize the correlation of views projected from same instance. Semantic space: new common feature space View 1 View 2

Kernel Canonical Correlation Analysis Linear CCA Given paired data: Select directions so as to maximize: Same objective, but projections in kernel space:, Kernel CCA Given pair of kernel functions:, [Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004]

Recap: Building the image database Semantic space Visual feature spacetag feature space

Experiments We compare the retrieval performance of our method with two baselines: Query image 1 st retrieved image Visual-Only Baseline Query image 1 st retrieved image Words+Visual Baseline [Hardoon et al. 2004, Yakhenenko et al. 2009] KCCA semantic space

We use Normalized Discounted Cumulative Gain at top K to evaluate retrieval performance: Evaluation Doing well in the top ranks is more important. Sum of all the scores (normalization) Reward term score for p th example [Kekalainen & Jarvelin, 2002]

We present the score using two different reward terms: Evaluation scalepresence relative rank absolute rank Object presence/scale Ordered tag similarity Cow Tree Grass Person Cow Tree Fence Grass Rewards similarity of query’s objects/scales and those in retrieved image(s). Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s).

Dataset LabelMe  6352 images  Database: 3799 images  Query: 2553 images  ~23 tags/image Pascal  9963 images  Database: 5011 images  Query: 4952 images  ~5.5 tags/image

Image database Image-to-image retrieval We want to retrieve images most similar to the given query image in terms of object importance. Tag-list kernel spaceVisual kernel space Untagged query image Retrieved images

Our method Words + Visual Visual only Image-to-image retrieval results Query Image

Image-to-image retrieval results Our method Words + Visual Visual only Query Image

Image-to-image retrieval results Our method better retrieves images that share the query’s important objects, by both measures. Retrieval accuracy measured by object+scale similarity Retrieval accuracy measured by ordered tag-list similarity 39% improvement

Tag-to-image retrieval We want to retrieve the images that are best described by the given tag list Image database Tag-list kernel spaceVisual kernel space Query tags Cow Person Tree Grass Retrieved images

Tag-to-image retrieval results Our method better respects the importance cues implied by the user’s keyword query. 31% improvement

Image-to-tag auto annotation We want to annotate query image with ordered tags that best describe the scene. Image database Tag-list kernel spaceVisual kernel space Untagged query image Output tag-lists Cow Tree Grass Cow Grass Field Cow Fence

Image-to-tag auto annotation results Boat Person Water Sky Rock Bottle Knife Napkin Light fork Person Tree Car Chair Window Tree Boat Grass Water Person k = number of nearest neighbors used