Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.

Slides:

Advertisements

Similar presentations

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

Advertisements

Unsupervised Learning

Supervised Learning Recap

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u

Markov Networks.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Final review LING572 Fei Xia Week 10: 03/13/08 1.

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Lecture 5: Learning models using EM

Conditional Random Fields

Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.

Bayesian Learning Rong Jin.

Bayesian Networks Alan Ritter.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Crash Course on Machine Learning

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.

Final review LING572 Fei Xia Week 10: 03/11/

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Directed - Bayes Nets Undirected - Markov Random Fields Gibbs Random Fields Causal graphs and causality GRAPHICAL MODELS.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

Markov Random Fields Probabilistic Models for Images

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

CS Statistical Machine learning Lecture 24

Slides for “Data Mining” by I. H. Witten and E. Frank.

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

John Lafferty Andrew McCallum Fernando Pereira

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Conditional Markov Models: MaxEnt Tagging and MEMMs

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.

IE With Undirected Models: the saga continues

Learning Coordination Classifiers

Statistical Models for Automatic Speech Recognition

Data Mining Lecture 11.

Prof. Adriana Kovashka University of Pittsburgh April 4, 2017

Klein and Manning on CRFs vs CMMs

Markov Networks.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Learning Markov Networks

Statistical Models for Automatic Speech Recognition

IE With Undirected Models

NER with Models Allowing Long-Range Dependencies

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Markov Networks.

Sequential Learning with Dependency Nets

Chapter 14 February 26, 2004.

Presentation transcript:

Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox

Review Computer science is full of equivalences – SQL  relational algebra – YFCL  optimizing … on the training data – gcc –O4 foo.c  gcc foo.c Also full of relationships between sets: – Finding smallest error-free decision tree >> 3-SAT – DataLog >> relational algebra – CFL >> Det FSMs = RegEx

Review Bayes Nets: describe a (family of) joint distribution(s) between random variables – They are an operational description (a program) for how data can be generated – They are a declarative description (a definition) for the joint distribution, and from this we can derive algorithms for doing stuff other than generation There is a close connection between Naïve Bayes and loglinear models

NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * NB-JL NB-CL NB-CL*

NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * * * * * * * * * Max CL(y|x) + G(0,1.0) * Y WjWj “Optimal if”

Similarly for sequences… An HMM is a Bayes net – It implies a set of independence assumptions – ML parameter setting and Viterbi are optimal if these hold A CRF is a Markov field – It implies a set of independence assumptions – These, plus the goal of maximizing Pr(y|x), give us a learning algorithm You can construct features so that any HMM can be emulated by a CRF with those features

In sequence space… CRF/loglinear models HMMs Multinomial? models * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * JL CL CL*

Review: CRFs/Markov Random Fields When will prof Cohen post the notes Semantics of a Markov random field Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |Y i-1,Y i+1 ) Probability distribution:

Review: CRFs/Markov Random Fields B I O B I O B I O B I O B I O B I O B I O When will prof Cohen post the notes …

Review: CRFs/Markov Random Fields When will prof Cohen post the notes Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |neighbors of Yi) Probability distribution: Yf

Pseudo-likelihood and dependency networks Any Markov field defines a (family of) probability distributions D – But not a simple program for generation/sampling – We can use MCMC in the general case If you have for each node i, P D (X i |Pa i ), that’s a dependency net – Still no simple program for generation/sampling (but can use Gibbs) – You can learn these from data using YFCL – Equivalently: learning this maximizes pseudo-likelihood, just as HMM learning maximizes (real) likelihood on a sequence. A weirdness: every MRF has an equivalent dependency net, but every dependency net (set of local conditionals) does not have an equivalent MRF

And now for …