Presentation is loading. Please wait.

Presentation is loading. Please wait.

Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.

Similar presentations


Presentation on theme: "Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox."— Presentation transcript:

1 Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox

2 Review Computer science is full of equivalences – SQL  relational algebra – YFCL  optimizing … on the training data – gcc –O4 foo.c  gcc foo.c Also full of relationships between sets: – Finding smallest error-free decision tree >> 3-SAT – DataLog >> relational algebra – CFL >> Det FSMs = RegEx

3 Review Bayes Nets: describe a (family of) joint distribution(s) between random variables – They are an operational description (a program) for how data can be generated – They are a declarative description (a definition) for the joint distribution, and from this we can derive algorithms for doing stuff other than generation There is a close connection between Naïve Bayes and loglinear models

4 NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * NB-JL NB-CL NB-CL*

5 NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * * * * * * * * * Max CL(y|x) + G(0,1.0) * Y WjWj “Optimal if”

6 Similarly for sequences… An HMM is a Bayes net – It implies a set of independence assumptions – ML parameter setting and Viterbi are optimal if these hold A CRF is a Markov field – It implies a set of independence assumptions – These, plus the goal of maximizing Pr(y|x), give us a learning algorithm You can construct features so that any HMM can be emulated by a CRF with those features

7 In sequence space… CRF/loglinear models HMMs Multinomial? models * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * JL CL CL*

8 Review: CRFs/Markov Random Fields When will prof Cohen post the notes Semantics of a Markov random field Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |Y i-1,Y i+1 ) Probability distribution:

9 Review: CRFs/Markov Random Fields B I O B I O B I O B I O B I O B I O B I O When will prof Cohen post the notes …

10 Review: CRFs/Markov Random Fields When will prof Cohen post the notes Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |neighbors of Yi) Probability distribution: Yf

11 Pseudo-likelihood and dependency networks Any Markov field defines a (family of) probability distributions D – But not a simple program for generation/sampling – We can use MCMC in the general case If you have for each node i, P D (X i |Pa i ), that’s a dependency net – Still no simple program for generation/sampling (but can use Gibbs) – You can learn these from data using YFCL – Equivalently: learning this maximizes pseudo-likelihood, just as HMM learning maximizes (real) likelihood on a sequence. A weirdness: every MRF has an equivalent dependency net, but every dependency net (set of local conditionals) does not have an equivalent MRF

12 And now for …


Download ppt "Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox."

Similar presentations


Ads by Google