Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Similar presentations


Presentation on theme: "Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar."— Presentation transcript:

1 Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar

2 Outline Introduction Directed Graphical Models –Hidden Markov Models (HMMs) –Maximum Entropy Markov Models (MEMMs) Label Bias Problem Undirected Graphical Models –Conditional Random Fields (CRFs) Summary

3 The Task Labeling –Given sequence data, mark appropriate tags for each data item Segmentation –Given sequence data, segment into non- overlapping groups such that related entities are in same group

4 Applications Computational Linguistics –POS Tagging –Information Extraction –Syntactic Disambiguation Computational Biology –DNA and Protein Sequence Alignment –Sequence homologue searching –Protein Secondary Structure Prediction

5 Example : POS Tagging

6 Directed Graphical Models Hidden Markov models (HMMs) –Assign a joint probability to paired observation and label sequences –The parameters trained to maximize the joint likelihood of train examples

7 Hidden Markov Models (HMMs) Generative Model - Models the joint distribution Generation Process –Probabilistic Finite State Machine –Set of states – Correspond to tags –Alphabet - Set of words –Transition Probability – –State Probability –

8 HMMs (Contd..) For a given word/tag sequence pair Why Hidden? –Sequence of tags which generated word sequence not visible Why Markov? –Based on Markovian Assumption : current tag depends only on previous ‘n’ tags –Solves the “sparsity problem” Training – Learning the transition and emission probabilities from data

9 HMMs Tagging Process Given a string of words w, choose tag sequence t* such that Computationally expensive - Need to evaluate all possible tag sequences! –For ‘n’ possible tags, m positions – Viterbi Algorithm –Used to find the optimal tag sequence t* –Efficient dynamic programming based algorithm

10 Disadvantages of HMMs Need to enumerate all possible observation sequences Not possible to represent multiple interacting features Difficult to model long-range dependencies of the observations Very strict independence assumptions on the observations

11 Maximum Entropy Markov Models (MEMMs) Conditional Exponential Models –Assumes observation sequence given (need not model) –Trains the model to maximize the conditional likelihood P(Y|X)

12 MEMMs (Contd..) For a new data sequence x, the label sequence y which maximizes P(y|x,Θ) is assigned (Θ - parameter set) Arbitrary non-independent features on observation sequence possible Conditional Models known to perform well than Generative Performs Per-State Normalization –Total mass which arrives at a state must be distributed among all possible successor states

13 Label Bias Problem Bias towards states with fewer outgoing transitions Due to per-state normalization An Example MEMM

14 Undirected Graphical Models Random Fields

15 Conditional Random Fields (CRFs) Conditional Exponential Model like MEMM Has all the advantages of MEMMs without label bias problem –MEMM uses per-state exponential model for the conditional probabilities of next states given the current state –CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence Allow some transitions “vote” more strongly than others depending on the corresponding observations

16 Definition of CRFs

17 CRF Distribution Function Where : V = Set of Label Random Variables f k and g k = Features g k = State Feature f k = Edge Feature are parameters to be estimated y| e = Set of Components of y defined by edge e y| v = Set of Components of y defined by vertex v

18 CRF Training

19 CRF Training (Contd..) Condition for maximum likelihood Expected feature count computed using Model equals Empirical feature count from training data Closed form solution for parameters not possible Iterative algorithms employed - Improve log likelihood in successive iterations Examples –Generalized Iterative Scaling (GIS) –Improved Iterative Scaling (IIS)

20 Graphical Comparison HMMs, MEMMs, CRFs

21 POS Tagging Results

22 Summary HMMs –Directed, Generative graphical models –Cannot be used to model overlapping features on observations MEMMs –Directed, Conditional Models –Can model overlapping features on observations –Suffer from label bias problem due to per-state normalization CRFs –Undirected, Conditional Models –Avoids label bias problem –Efficient training possible

23 Thanks! Acknowledgements Some slides in this presentation are from Rongkun Shen’s (Oregon State Univ) Presentation on CRFs


Download ppt "Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar."

Similar presentations


Ads by Google