A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Learning HMM parameters
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
John Lafferty, Andrew McCallum, Fernando Pereira
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 5: Learning models using EM
Ensemble Learning: An Introduction
Conditional Random Fields
Hidden Markov Models David Meir Blei November 1, 1999.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Computer vision: models, learning and inference
Graphical models for part of speech tagging
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Use of moment generating functions 1.Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …).
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
HMM - Part 2 The EM algorithm Continuous density HMM.
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
John Lafferty Andrew McCallum Fernando Pereira
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
Hidden Markov Models BMI/CS 576
Today.
Markov Random Fields Presented by: Vladan Radosavljevic.
Hidden Markov Models Teaching Demo The University of Arizona
Presentation transcript:

A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology

Forward-backward algorithm Allows efficient calculation of sums (e.g. expectation,...) over all paths in a trellis. Plays an important role in sequence modeling HMMs (Hidden Markov Models) CRFs (Conditional Random Fields) [Lafferty et al., 2001]...

A sequential labeling example: part-of-speech tagging SOURCE “Time flies like an arrow” Time [noun] Time [verb] Time [prep.] flies [noun] flies [verb] flies [prep.] like [noun] like [verb] like [prep.] an [noun] an [verb] an [prep.] arrow [noun] arrow [verb] arrow [prep.] SINK Time [indef. art.] flies [indef. art.] like [indef. art.] an [indef. art.] arrow [indef. art.] in CRFs and HMMs, we need to compute the "sum" of the probabilities (or scores) of all paths.

Forward-backward algorithm efficiently computes sums over all paths in the trellis with dynamic programming It is intractable to enumerate all paths in the trellis because the number of all paths is enormous Forward-backward algorithm recursively computes the sum from source/sink to sink/source with keeping intermediate results on each node and arc

Forward-backward algorithm is applicable to Normalization constant of CRFs E-step for HMMs Feature expectation on CRFs = type of node/node pair = k-th feature= set of nodes and arcs (cliques) in path = set of paths

0 th -order moment (Normalization constant) 1 st -order moment Type of sums computable with forward- backward algorithm: = set of nodes and arcs (cliques) in path = set of paths

But sometimes we need higher-order multivariate moments... To name a few examples: Correlation between features Objectives more complex than log-likelihood Parameter differentiations of these...

Our goal: To generalize forward-backward algorithm for higher-order multivariate moments!

Can we derive dynamic programming for this formula? AnswerRecord multiple forward/backward variables for each clique, and Combine all the previously calculated values by the binomial theorem

SOURCE u A set of paths from SOURCE to u

SOURCE u A set of paths from SOURCE to u Ordinary forward- backward records only this variable

Direct ancestors of v u v SOURCE

Direct ancestors of v u v SOURCE These are derived from the binomial theorem

Direct ancestors of SINK SINK SOURCE Desired values

Summary of Our Ideas u v SOURCE multiple variables for each clique multiple variables for each clique Dependency between variables in a step, which is derived from the binomial theorem Dependency between variables in a step, which is derived from the binomial theorem

For multivariate cases, forward/backward variables have multiple indices u

To calculate the following form computational cost of the generalized forward- backward is proportional to Computational cost is only linear in the number of nodes and arcs in the trellis Linear in |V| and |E|

Merits of the generalized forward- backward algorithm 1.The generalized forward-backward subsumes many existing task-specific algorithms 2.For some tasks, it leads to a solution more efficient than the existing ones

Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms: TaskSum to compute Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002] Parameter diffs. of entropy for CRFs [Mann et al., 2007] Hessian-vector product for CRFs [Vishwanathan et al., 2006]

Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms: TaskSum to compute Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002] Parameter diffs. of entropy for CRFs [Mann et al., 2007] Hessian-vector product for CRFs [Vishwanathan et al., 2006] All these formulas have a form computable with our proposed method.

The previously proposed algorithms for these tasks are task-specific The generalized forward-backward is a task- independent algorithm applicable to formulae of the form If a problem involves this form, it immediately offers efficient solution

Merits of the generalized forward- backward algorithm 1.The generalized forward-backward subsumes many existing task-specific algorithms 2.For some tasks, it leads to a solution more efficient than the existing ones

Merit 2. Efficient optimization procedure with respect to Generalized Expectation Criteria for CRFs [Mann et al., 2008] Computational cost is proportional to Algorithm proposed in [Mann et al., 2008]By a specialization of the generalization Nodes labeled as answers ( L = # of nodes labeled as answers)

Future tasks Explore other tasks to which our generalized forward-backward algorithm is applicable Extend the generalized forward-backward to trees and general graphs containing cycles

Summary We have generalized the forward-backward algorithm to allow for higher-order multivariate moments The generalization offers an efficient way to compute complex models of sequences that involve higher- order multivariate moments Many existing task-specific algorithms are instances of this generalization It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs

Summary We have generalized the forward-backward algorithm to allow for higher-order multivariate moments The generalization offers an efficient way to compute complex models of sequences that involve higher- order multivariate moments Many existing task-specific algorithms are instances of this generalization It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs Thank you for your attention!