Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models for Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning.

Similar presentations

Presentation on theme: "Hidden Markov Models for Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning."— Presentation transcript:

1 Hidden Markov Models for Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning

2 HMM Approach to IE HMM states are associated with a semantic type background-text, person-name, etc. Constrained EM learns transitions and emissions Viterbi alignment of a document marks tagged ranges of text with the same semantic type Extract range with highest probability Speaker is Huy Nguyen this week

3 Existing Work Leek (97 [UCSD MS thesis]) Early results, fixed structures Freitag & McCallum (99, 00) Grow complex structures

4 Limitations of Existing Work Only one field extracted at a time Relative position of fields is ignored e.g. authors usually come before titles in citations Similar-looking fields arent competed for e.g. acquired company vs. purchasing company Simple model of unknown words Use for all words seen less than N times No separation of content and context e.g. cant plug in generic date extractors, etc.

5 Current Research Goals Flexibly train and combine extractors for multiple fields of information Learn structures suited for individual fields Can be recombined and reused with many HMMs Learn intelligent context structures to link targets Canonical ordering of fields Common prefixes and suffixes Construct merged HMM for actual extraction Context/target split makes search problem tractable Transitions between models are compiled out in merge

6 Current Research Goals Richer models for handling unknown words Estimate likelihood of novel words in each state Featural decomposition for finer-grained probs e.g. Nguyen UNK[Capitalized, No-numbers] Character-level models for higher precision e.g. phone numbers, room numbers, dates, etc. Conditional training to focus on extraction task Classical joint estimation often wastes states modeling patterns in English background text Conditional training is slower, but only rewards structure that increases labeling accuracy

7 Learning Target Structures Goal: Learn flexible structure tailored to composition of particular fields Representation: Disjunction of multi-state chains Learning method: Collect and isolate all examples of the target field Initialization: single state Search operators (greedy search): extend current chain(s) Start a new chain Stopping criteria: MDL score

8 Example Target HMM: dlramt STARTEND mln billion U.S. Canadian dlrs dollars yen pesos undisclosed withheld amount

9 Learning Context Structures Goal: Learn structure to connect multiple target HMMs Captures canonical ordering of fields Identifies prefix and suffix patterns around targets Initialization: Background state connected to each target Find minimum # words between each target type in corpus Connect targets directly if distance is 0 Add context state between targets if theyre close Search operators (greedy search): Add prefix/suffix between background and target Lengthen an existing chain Start a new chain (by splitting an existing one) Stopping criteria: MDL score

10 Example of Context HMM Background ContextPurchaserAcquired purchased acquired bought STARTEND The yesterday Reuters

11 Merging Context and Targets In context HMM, targets are collapsed into a single state that always emits purchaser etc. Target HMMs have single START and END state Glue target HMMs into place by compiling out start/end transitions and creating one big HMM Challenge: create supportive structure without being overly restrictive Too little structure hard to find regularities Too much structure cant generate all docs

12 Example of Merging HMMs Background ContextPurchaserAcquired STARTEND STARTEND Background ContextAcquired STARTEND

13 Tricks and Optimizations Mandatory end state Allows explicit modeling of document end Structural enhancements Add transitions from start directly to targets Add transitions from target/suffix directly to end Allow skip-ahead transitions Separation of core structure learning Structure learning is performed on skeleton structure Enhancements are added during parameter estimation Keeps search tractable while exploiting rich transitions

14 Sample of Recent F1 Results

15 Unknown Word Results

16 Conditional Training Observation: Joint HMMs waste states modeling patterns in background text Improves document likelihood (like n-grams) Doesnt improve labeling accuracy (can hurt it!) Ideally focus on prefixes, suffixes, etc. only Idea: Maximize conditional probability of labels P(labels|words) instead of P(labels, words) Should only reward modeling helpful patterns Cant use standard Baum-Welch training Solution: use numerical optimization (CG)

17 Potential of Conditional Training Dont waste states modeling background patterns Toy data model: ((abc)*(eTo))* [T is target] e.g. abcabcabcabceToabcabceToabcabcabc Modeling abc improves joint likelihood but provides no help for labeling targets a|o b c|e T o a|b|c e T Optimal Joint ModelOptimal Labeling Model

18 Running Conditional Training Gradient descent requires differentiable function Value: Deriv: Likelihood and expectations are easily computed with existing HMM algorithms Compute values with and without type constraints Forward algorithm Param expectations

19 Challenges for Cond. Training Need additional constraint to keep numbers small Cant guarantee youll get a probability distribution But its ok if youre just summing and multiplying! Solution: sum of all params must equal a constant Need to fix parameter space ahead of time Cant add states, new words, etc. Solution: start with large ergodic model in which all states emit entire vocabulary (use UNK tokens) Need sensible initialization Uniform structure has high variance Fixed structure usually dictates training

20 Results on Toy Data Set Results on (([ae][bt][co])*(eto))* Contains spurious prefix/target/suffix-like symbols Joint training always labels every t Conditional training eventually gets it perfectly

21 Current and Future Work Richer search operators for structure learning Richer models of unknown words (char-level) Reduce variance of conditional training Build reusable repository of target HMMs Integrate with larger IE framework(s) Semantic Web / KAON LTG Applications Semi-automatic ontology markup for web pages Smart processing

Download ppt "Hidden Markov Models for Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning."

Similar presentations

Ads by Google