Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.

Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor Darrell

Attribute Learning furrystripedwheelswhitewater bright generalizable vs. discriminative which attributes to use? (Lampert, Nickisch & Harmeling, 2009; Farhadi &Forsyth, 2009; Parikh & Grauman, 2011 …)

Attribute Generative Model Cartoon face car cup … eye nose mouth eye nose mouth eye …edges… objects attributes features

Attribute Generative Model Cartoon car cup … eye nose mouth eye nose mouth eye face …edges… objects attributes features

Attribute Generative Model Cartoon car cup … nose mouth eye nose mouth eye face eye …edges… objects attributes features

Attribute Generative Model Cartoon car cup … nose mouth eye nose mouth face eye …edges… objects attributes features

Attribute Generative Model Cartoon face cup … eye nose eye nose mouth car …edges… objects attributes features

Attribute Generative Model Cartoon face cup … eye nose mouth eye nose mouth car …edges… objects attributes features

Attribute Generative Model Cartoon face cup … eye nose mouth eye nose mouth …edges… car objects attributes features

Goals I Flexibility: automatically determine the attributes – as expressive as needed, as compact as possible  non-parametric Bayesian Animals ? Humans furrystripedwhitewater furrystripedwhitewater

Goals II Efficiently learnable: few positive training samples – reduce sample complexity Pug dog Samoyed dog  Related samples Corgi dog Knowledge transfer via attributes

Goals III Discriminative: object classification task  max margin ++ + + + - - - - - -

Outline Non-parametric Bayesian for flexible attribute learning Sample relatedness for knowledge transfer Discriminative generative model

Preliminaries: Non-parametric Bayesian Bayesian rule applied in machine learning Model comparison for model selection: Prediction: likelihood of prior probability of posterior ofgiven

Non-parametric Bayesian Models Inflexible models yield unreasonable inferences. Non-parametric models can automatically infer an adequate model size/complexity from the data, without needing to explicitly do Bayesian model comparison. Many can be derived by starting with a finite parametric model and taking the limit as number of parameters

Finite Mixture Model Set of observations: Constant clusters, Cluster assignment for is The probability of each sample: The likelihood of samples:

Infinite Mixture Model Infinite clusters likelihood – It is like saying that we have: Since we always have limited samples in reality, we will have limited number of clusters used; so we define two sets of clusters: – numbers of classes for which Assume a reordering, such that

Finite Feature Model Generating : binary matrix – For each column, draw from beta distribution – For each customer, flip a coin by Distribution of customers features Integrate out, leaving:

Finite Feature Model Generating : binary matrix – For each column, draw from beta distribution – For each customer, flip a coin by is sparse customers features Even, the matrix is expected to have a finite Number of non-zero elements.

From Finite to Infinite Binary Matrices A technical difficulty: the probability for any particular matrix goes to zero as However, if consider equivalent classes of matrices in left- ordered form obtained by reordering the columns: – is the number of features assigned – is the th harmonic number – This distribution is exchangeable, independent of the ordering.

From Finite to Infinite Binary Matrices a)The binary matrix on the left is transformed to the binary matrix on the right by the function lof(). b)A left-ordered binary matrix generated by Indian Buffet Process.

Indian Buffet Process First customer starts at the left of the buffet, and takes a serving from each dish, stopping after a number of dishes as her plate becomes overburdened. The i-th customer moves along the buffet, sampling dishes in proportion to their popularity, with probability, and trying a number of new dishes. “Many Indian restaurants offer lunchtime Buffets with an apparently infinite number of dishes.” customers Buffet dishes

Non-parametric Learning Infinite attributes – Indian Buffet Process prior prob (image n samples attribute k) sample new attributes Likelihood : (Griffiths & Ghahramani, 2006) -, -, - wheels striped … … furry striped wheels white bright … …

Asymptotic Model prob(image n samples attribute k) sample new attributes (Broderick, Kulis, Jordan ICML 2013) data binary assignments dictionary data binary assignments dictionary size Asymptotics furry striped wheels white bright … is determined automatically

Asymptotics DP mixture k-means ?? Mixture of Gaussians Bayesian, nonparametric cov  zero flexible, principled simple, efficient, “practical” Principled discrete criteria from BNP: Dirichlet Process  k-means + penalty (Kulis & Jordan, ICML 2012) Beta Process  squared loss + penalty (Broderick, Kulis & Jordan, ICML 2013) Dependent Dirichlet Process (Campbell, Liu, Kulis, How, Carin, NIPS 2013)

Sample Relatedness polar bear ? ? motorbike clown fish (Christiane Fellbaum, WordNet, 1998) Related samples path length in WordNet

Full Model Pug DogSamoyed Dog Cat Input features Attributes Classifiers Related SamplesNegative Samples Pug dog Samoyed dog Cat + + + - - - Positive Samples … … attributes with sample relatedness discriminative

Joint Learning of Dictionary & Classifiers BCD: alternatingly update classifiers & dictionary

Does It Work? classification accuracy on ImageNet sample-efficient: higher accuracy with fewer training samples generalization better representation of new classes AwA data

Why Does It Work? more data 15 20 25 30 50 more related information using related samples increases sample-efficiency

Why Does It Work? non-parametric: adapts to complexity of the data  representation-efficient

Conclusions Flexible attribute learning method – generalize to new categories – adaptive to the dataset complexity Efficiently learnable – sample efficiency – reduce the user annotation effort Perform Well – recognize existing and new categories well

Thanks! Q&A

Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.

Similar presentations

Presentation on theme: "Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.

Similar presentations

Presentation on theme: "Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor."— Presentation transcript:

Similar presentations

About project

Feedback