Download presentation
Presentation is loading. Please wait.
1
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן
2
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6802 Supervised Learning Scheme Classification Model “Labeled” Examples New Examples Classifications Training Algorithm Classification Algorithm
3
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6803 Transformational Based Learning (TBL) for Tagging Introduced by Brill (1995) Can exploit a wider range of lexical and syntactic regularities via transformation rules – triggering environment and rewrite rule Tagger: –Construct initial tag sequence for input – most frequent tag for each word –Iteratively refine tag sequence by applying “transformation rules” in rank order Learner: –Construct initial tag sequence for the training corpus –Loop until done: Try all possible rules and compare to known tags, apply the best rule r* to the sequence and add it to the rule ranking
4
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6804 Some examples 1. Change NN to VB if previous is TO –to/TO conflict/NN with VB 2. Change VBP to VB if MD in previous three –might/MD vanish/VBP VB 3. Change NN to VB if MD in previous two –might/MD reply/NN VB 4. Change VB to NN if DT in previous two –the/DT reply/VB NN
5
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6805 Transformation Templates Specify which transformations are possible For example: change tag A to tag B when: 1.The preceding (following) tag is Z 2.The tag two before (after) is Z 3.One of the two previous (following) tags is Z 4.One of the three previous (following) tags is Z 5.The preceding tag is Z and the following is W 6.The preceding (following) tag is Z and the tag two before (after) is W
6
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6806 Lexicalization New templates to include dependency on surrounding words (not just tags): Change tag A to tag B when: 1.The preceding (following) word is w 2.The word two before (after) is w 3.One of the two preceding (following) words is w 4.The current word is w 5.The current word is w and the preceding (following) word is v 6.The current word is w and the preceding (following) tag is X (Notice: word-tag combination) 7.etc…
7
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6807 Initializing Unseen Words How to choose most likely tag for unseen words? Transformation based approach: –Start with NP for capitalized words, NN for others –Learn “morphological” transformations from: Change tag from X to Y if: 1.Deleting prefix (suffix) x results in a known word 2.The first (last) characters of the word are x 3.Adding x as a prefix (suffix) results in a known word 4.Word W ever appears immediately before (after) the word 5.Character Z appears in the word
8
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6808 Unannotated Input Text Annotated Text Ground Truth for Input Text Rules Learning Algorithm TBL Learning Scheme Setting Initial State
9
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6809 Greedy Learning Algorithm Initial tagging of training corpus – most frequent tag per word At each iteration: –Identify rules that fix errors and compute “error reduction” for each transformation rule: #errors fixed - #errors introduced –Find best rule; If error reduction greater than a threshold (to avoid overfitting): Apply best rule to training corpus Append best rule to ordered list of transformations
10
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68010 Stochastic POS Tagging POS tagging: For a given sentence W = w 1 …w n Find the matching POS tags T = t 1 …t n In a statistical framework: T' = arg max P(T|W) T
11
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68011 Bayes’ Rule Words are independent of each other A word’s identity depends only on its own tag Markovian assumptions Denominator doesn’t depend on tags Chaining rule Notation: P(t 1 ) = P(t 1 | t 0 )
12
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68012 The Markovian assumptions Limited Horizon –P(X i+1 = t k |X1,…,X i ) = P(X i+1 = t k | X i ) Time invariant –P(X i+1 = t k | X i ) = P(X j+1 = t k | X j )
13
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68013 Maximum Likelihood Estimations In order to estimate P(w i |t i ), P(t i |t i-1 ) we can use the maximum likelihood estimation –P(w i |t i ) = c(w i,t i ) / c(t i ) –P(t i |t i-1 ) = c(t i-1 t i ) / c(t i-1 ) Notice estimation for i=1
14
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68014 Unknown Words Many words will not appear in the training corpus. Unknown words are a major problem for taggers (!) Solutions – –Incorporate Morphological Analysis – Consider words appearing once in training data as UNKOWNs
15
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68015 “Add-1/Add-Constant” Smoothing
16
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68016 Smoothing for Tagging For P(t i |t i-1 ) Optionally – for P(t i |t i-1 )
17
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68017 Viterbi Finding the most probable tag sequence can be done with the viterbi algorithm. No need to calculate every single possible tag sequence (!)
18
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68018 Hmms Assume a state machine with –Nodes that correspond to tags –A start and end state –Arcs corresponding to transition probabilities - P(t i |t i-1 ) –A set of observations likelihoods for each state - P(w i |t i )
19
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68019 NN VBZ NNS AT VB RB P(like)=0.2 P(fly)=0.3 … P(eat)=0.36 0.6 0.4 P(likes)=0.3 P(flies)=0.1 … P(eats)=0.5 P(the)=0.4 P(a)=0.3 P(an)=0.2 …
20
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68020 HMMs An HMM is similar to an Automata augmented with probabilities Note that the states in an HMM do not correspond to the input symbols. The input symbols don’t uniquely determine the next state.
21
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68021 HMM definition HMM=(S,K,A,B) –Set of states S={s 1,…s n } –Output alphabet K={k 1,…k n } –State transition probabilities A={a ij } i,j S –Symbol emission probabilities B=b(i,k) i S,k K –start and end states (Non emitting) Alternatively: initial state probabilities Note: for a given i- a ij =1 & b(i,k)=1
22
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68022 Why Hidden? Because we only observe the input - the underlying states are hidden Decoding: The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w 1,…,w n find a state sequence T=t 1,…,t n that best explains the observation.
23
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68023 Homework
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.