Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana Lazebnik, Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:
Part 1: Bag-of-words models by Li Fei-Fei (Princeton)
Marco Cristani Teorie e Tecniche del Riconoscimento1 Teoria e Tecniche del Riconoscimento Estrazione delle feature: Bag of words Facoltà di Scienze MM.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Generative learning methods for bags of features
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object retrieval with large vocabularies and fast spatial matching
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Lecture 28: Bag-of-words models
Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Bag-of-features models
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Generative learning methods for bags of features
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Three kinds of learning
“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Words and Pictures Rahul Raguram. Motivation  Huge datasets where text and images co-occur ~ 3.6 billion photos.
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
Lecture XI: Object Recognition (2)
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
By Suren Manvelyan,
Discriminative and generative methods for bags of features
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Clustering Unsupervised learning Generating “classes”
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Indexing Techniques Mei-Chen Yeh.
Exercise Session 10 – Image Categorization
Computer Vision James Hays, Brown
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
CSE 185 Introduction to Computer Vision Pattern Recognition.
CSE 473/573 Computer Vision and Image Processing (CVIP)
Machine Learning Overview Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Category Discovery from the Web slide credit Fei-Fei et. al.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Clustering.
Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
A SAMPLE RECOGNITION PROBLEM Joseph Tighe University of North Carolina at Chapel Hill.
CS654: Digital Image Analysis
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling.
Lecture IX: Object Recognition (2)
Semi-Supervised Clustering
Video Google: Text Retrieval Approach to Object Matching in Videos
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
CS 1674: Intro to Computer Vision Scene Recognition
Text Categorization Berlin Chen 2003 Reference:
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana Lazebnik, Fei-Fei Li, Rob Fergus, and Antonio Torralba

Announcements HW1 due Thurs, Sept 12pm – By to No need to include shopping image – Write-up can be webpage or pdf.

Document Vectors  Represent document as a “bag of words”

Origin: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Origin: Bag-of-words models US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Origin: Bag-of-words models US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Origin: Bag-of-words models US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Bags of features for image classification 1.Extract features

2.Learn “visual vocabulary” Bags of features for image classification

1.Extract features 2.Learn “visual vocabulary” 3.Quantize features using visual vocabulary Bags of features for image classification

1.Extract features 2.Learn “visual vocabulary” 3.Quantize features using visual vocabulary 4.Represent images by frequencies of “visual words” Bags of features for image classification

Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, Feature extraction

Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005 Interest point detector – Csurka et al – Fei-Fei & Perona, 2005 – Sivic et al Feature extraction

Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005 Interest point detector – Csurka et al – Fei-Fei & Perona, 2005 – Sivic et al Other methods – Random sampling (Vidal-Naquet & Ullman, 2002) – Segmentation-based patches (Barnard et al. 2003) 1. Feature extraction

Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic 1. Feature extraction

2. Learning the visual vocabulary …

Clustering … Slide credit: Josef Sivic

2. Learning the visual vocabulary Clustering … Slide credit: Josef Sivic Visual vocabulary

Clustering – The assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters. – Often similarity is assessed according to a distance measure. – Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

Any of the similarity metrics we talked about before (SSD, angle between vectors)

Feature Clustering Clustering is the process of grouping a set of features into clusters of similar features. Features within a cluster should be similar. Features from different clusters should be dissimilar.

source: Dan Klein

K-means clustering Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k source: Svetlana Lazebnik

K-means clustering Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k source: Svetlana Lazebnik

source: Dan Klein

Source: Hinrich Schutze

Hierarchical clustering strategies Agglomerative clustering Start with each point in a separate cluster At each iteration, merge two of the “closest” clusters Divisive clustering Start with all points grouped into a single cluster At each iteration, split the “largest” cluster source: Svetlana Lazebnik

source: Dan Klein

Divisive Clustering Top-down (instead of bottom-up as in Agglomerative Clustering) Start with all docs in one big cluster Then recursively split clusters Eventually each node forms a cluster on its own. Source: Hinrich Schutze

Flat or hierarchical clustering? For high efficiency, use flat clustering (e.g. k means) For deterministic results: hierarchical clustering When a hierarchical structure is desired: hierarchical algorithm Hierarchical clustering can also be applied if K cannot be predetermined (can start without knowing K) Source: Hinrich Schutze

2. Learning the visual vocabulary Clustering … Slide credit: Josef Sivic

2. Learning the visual vocabulary Clustering … Slide credit: Josef Sivic Visual vocabulary

From clustering to vector quantization Clustering is a common method for learning a visual vocabulary or codebook – Unsupervised learning process – Each cluster center produced by k-means becomes a codebook entry – Codebook can be learned on separate training set – Provided the training set is sufficiently representative, the codebook will be “universal” The codebook is used for quantizing features – A vector quantizer takes a feature vector and maps it to the index of the nearest entry in the codebook – Codebook = visual vocabulary – Codebook entry = visual word

Example visual vocabulary Fei-Fei et al. 2005

Image patch examples of visual words Sivic et al. 2005

Visual vocabularies: Issues How to choose vocabulary size? – Too small: visual words not representative of all patches – Too large: quantization artifacts, overfitting Computational efficiency – Vocabulary trees (Nister & Stewenius, 2006)

3. Image representation ….. frequency codewords

Image classification (next) Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Clustering in Action

President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Names and Faces Who’s in the picture? T.L. Berg, A.C. Berg, J. Edwards, D.A. Forsyth

Intuition George Bush

500k News Corpora Producer and director Bruce Paltrow has died at the age of 58 in Rome, Italy, the U.S. Consulate said on October 3, Paltrow had suffered from throat cancer for several years, but the cause of his death was not immediately known. He is seen with his daughter actress Gwyneth Paltrow after the Academy Awards in Los Angles in March 21, 1999 file photo. (Fred Prouser/Reuters) Actress Winona Ryder (news) reacts to remarks by prosecutor Ann Rundle during the sentencing hearing in her felony shoplifting case Friday, Dec. 6, 2002 at the Beverly Hills, Calif., courthouse. At right is Ryder's attorney Mark Geragos. Ryder was sentenced to three years of probation and was ordered to perform 480 hours of community service. (AP Photo/Steve Grayson, POOL)

President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Name & Face Extraction Detected Faces

President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Name & Face Extraction Detected Names: President George W. Bush, Defense Donald Rumsfeld, Saddam Hussein. Detected Faces

Each name in the dataset is a potential cluster. Want to simultaneously: 1.) Learn image model for each person. 2.) Learn depiction model across names. Achieve both of these by considering a big assignment (clustering) problem. Goal

Assignment Problem

Language indicates Depiction President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Cues - POS tags before and after name, location in caption, distance to closest: ( ) (L) (C) (R) left right center shown pictured above P(Depicted | Context) Yes/No multiple independent cues

1.) Update assignments 2.) Update: appearance model for each person. language model of depiction across names. Iterate 1-2 Method

Results British director Sam Mendes and his partner actress Kate Winslet arrive at the London premiere of 'The Road to Perdition', September 18, The films stars Tom Hanks as a Chicago hit man who has a separate family life and co- stars Paul Newman and Jude Law. REUTERS/Dan Chung World number one Lleyton Hewitt of Australia hits a return to Nicolas Massu of Chile at the Japan Open tennis championships in Tokyo October 3, REUTERS/Eriko Sugita

US President George W. Bush (L) makes remarks while Secretary of State Colin Powell (R) listens before signing the US Leadership Against HIV /AIDS, Tuberculosis and Malaria Act of 2003 at the Department of State in Washington, DC. The five-year plan is designed to help prevent and treat AIDS, especially in more than a dozen African and Caribbean nations(AFP/Luke Frazza) German supermodel Claudia Schiffer gave birth to a baby boy by Caesarian section January 30, 2003, her spokeswoman said. The baby is the first child for both Schiffer, 32, and her husband, British film producer Matthew Vaughn, who was at her side for the birth. Schiffer is seen on the German television show 'Bet It...?!' ('Wetten Dass...?!') in Braunschweig, on January 26, (Alexandra Winkler/Reuters) Results

Without – CEO Summit With – Martha Stewart Without – James Bond With – Pierce Brosnan Without – Dick Cheney With – George W. Bush ModelAccuracy of labeling Vision model, No Lang model67% Vision model + Lang model78%

Face Dictionary

Results - Depiction Classifier% correct Baseline (all pictured)67% Learned Lang Model86% IN - pictured, OUT - not pictured