Download presentation

Presentation is loading. Please wait.

Published byDrew Pottle Modified over 3 years ago

1
Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford University UAI 2008 Presented by Haojun Chen August 1 st, 2008

2
Outline Background and motivation Undirected transfer hierarchies Experiments Degree of transfer coefficients Experiments Summary

3
Background (1/2) Transfer learning Data from “similar” tasks/distributions are used to compensate for the sparsity of training data in primary class or task Example: Use rhinos to help learn elephants’ shape Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

4
Hierarchical Bayes (HB) framework Principled approach for transfer learning Background (2/2) Example of a hierarchical Bayes parameterization where : a set of related learning tasks/classes : observed data : task/class parameters Joint distribution over the observed data and all class parameters as follows: Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

5
Motivation In practice, point estimation of the MAP is desirable, for full Bayesian computations can be difficult and computationally demanding Efficient point estimation may not be achieved in many standard hierarchical Bayes models, because many common conjugate priors such as the Dirichlet or normal-inverse-Wishart are not convex with respect to the parameters In this paper, an undirected hierarchical Bayes(HB) reformulation is proposed to allow efficient point estimation

6
Undirected HB Reformulation : data-dependent objective : divergence function over child and parent parameters → 0 : encourages parameters to explain data →∞ : encourages parameters to be similar to parents Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

7
Purpose of Reformulation Easy to specify –F data can be likelihood, classification, or other objective –Divergence can be L1-norm, L2-norm, -insensitive loss, KL divergence, etc. –No conjugacy or proper prior restrictions Easy to optimize –Convex over if F data is concave and Divergence is convex Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

8
Bag-of-words model F data : Multinomial log likelihood (regularized) : frequency of word i Divergence: L2 norm Experiment: Text categorization Newsgroup20 Dataset Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

9
Text categorization Result 75150225300375 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Classification Rate Total Number of Training Instances Newsgroup Topic Classification Max Likelihood (No regularization) Regularized Max Likelihood Shrinkage Undirected HB Baseline: Maximum likelihood at each node (no hierarchy) Cross-validate regularization (no hierarchy) Shrinkage (McCallum et al. ’98, with hierarchy) Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

10
(Density estimation – test likelihood) Instances represented by 60 x-y coordinates of landmarks on outline Divergence: L2 norm over mean and variance Experiment: Shape Modeling Mean landmark location Covariance over landmarks Regularization Mammals Dataset (Fink, ’05) Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

11
Undirect HB Shape Modeling Result 6102030 -350 -300 -250 -200 -150 -100 -50 0 50 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Regularized Max Likelihood Elephant-Rhino Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

12
Problem in Transfer Not all parameters deserve equal sharing Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

13
is split into subcomponents with weights and hence different strengths are allowed for different subcomponents, child-parent pairs Degrees of Transfer (DOT) → 0 : forces parameters to agree →∞ : allows parameters to be flexible Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

14
Estimation of DOT Parameters Hyper-prior approach Bayesian idea: Put prior on and add as parameter to optimization along with Concretely: inverse-Gamma prior (forced to be positive) Prior on Degree of Transfer Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

15
DOT Shape Modeling Result 6102030 -15 -10 -5 0 5 10 15 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Regularized Max Likelihood Elephant-Rhino Hyperprior Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

16
Distribution of DOT coefficients 1/ Stronger transfer Weaker transfer root 05101520253035404550 0 2 4 6 8 10 12 14 16 18 20 Distribution of DOT coefficients using Hyperprior approach Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

17
Summary Undirected reformulation of the hierarchical Bayes framework is proposed for efficient convex point estimation Different degrees of transfer for different parameters are introduced so that some parts of the distribution can be transferred to a greater extent than others

Similar presentations

OK

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on different solid figures first grade Ppt on fibonacci numbers definition Ppt on zfs file system Ppt on different types of forests in the united Ppt on artificial intelligence in machines Ppt on computer languages list Ppt on water scarcity quotes Ppt on adivasis of india Ppt on regional trade agreements signed Ppt on medical tourism in dubai