Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.

Slides:



Advertisements
Similar presentations
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Advertisements

Unsupervised Learning
Supervised Learning Recap
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Markov Networks.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Lecture 5: Learning models using EM
Conditional Random Fields
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
Bayesian Learning Rong Jin.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Crash Course on Machine Learning
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
Final review LING572 Fei Xia Week 10: 03/11/
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Directed - Bayes Nets Undirected - Markov Random Fields Gibbs Random Fields Causal graphs and causality GRAPHICAL MODELS.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Markov Random Fields Probabilistic Models for Images
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
John Lafferty Andrew McCallum Fernando Pereira
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
IE With Undirected Models: the saga continues
Learning Coordination Classifiers
Today.
Statistical Models for Automatic Speech Recognition
Data Mining Lecture 11.
Prof. Adriana Kovashka University of Pittsburgh April 4, 2017
Klein and Manning on CRFs vs CMMs
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Learning Markov Networks
Statistical Models for Automatic Speech Recognition
IE With Undirected Models
NER with Models Allowing Long-Range Dependencies
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Markov Networks.
Sequential Learning with Dependency Nets
Chapter 14 February 26, 2004.
Presentation transcript:

Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox

Review Computer science is full of equivalences – SQL  relational algebra – YFCL  optimizing … on the training data – gcc –O4 foo.c  gcc foo.c Also full of relationships between sets: – Finding smallest error-free decision tree >> 3-SAT – DataLog >> relational algebra – CFL >> Det FSMs = RegEx

Review Bayes Nets: describe a (family of) joint distribution(s) between random variables – They are an operational description (a program) for how data can be generated – They are a declarative description (a definition) for the joint distribution, and from this we can derive algorithms for doing stuff other than generation There is a close connection between Naïve Bayes and loglinear models

NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * NB-JL NB-CL NB-CL*

NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * * * * * * * * * Max CL(y|x) + G(0,1.0) * Y WjWj “Optimal if”

Similarly for sequences… An HMM is a Bayes net – It implies a set of independence assumptions – ML parameter setting and Viterbi are optimal if these hold A CRF is a Markov field – It implies a set of independence assumptions – These, plus the goal of maximizing Pr(y|x), give us a learning algorithm You can construct features so that any HMM can be emulated by a CRF with those features

In sequence space… CRF/loglinear models HMMs Multinomial? models * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * JL CL CL*

Review: CRFs/Markov Random Fields When will prof Cohen post the notes Semantics of a Markov random field Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |Y i-1,Y i+1 ) Probability distribution:

Review: CRFs/Markov Random Fields B I O B I O B I O B I O B I O B I O B I O When will prof Cohen post the notes …

Review: CRFs/Markov Random Fields When will prof Cohen post the notes Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |neighbors of Yi) Probability distribution: Yf

Pseudo-likelihood and dependency networks Any Markov field defines a (family of) probability distributions D – But not a simple program for generation/sampling – We can use MCMC in the general case If you have for each node i, P D (X i |Pa i ), that’s a dependency net – Still no simple program for generation/sampling (but can use Gibbs) – You can learn these from data using YFCL – Equivalently: learning this maximizes pseudo-likelihood, just as HMM learning maximizes (real) likelihood on a sequence. A weirdness: every MRF has an equivalent dependency net, but every dependency net (set of local conditionals) does not have an equivalent MRF

And now for …