Presentation is loading. Please wait.

Presentation is loading. Please wait.

Factored Language Models EE517 Presentation April 19, 2005 Kevin Duh

Similar presentations


Presentation on theme: "Factored Language Models EE517 Presentation April 19, 2005 Kevin Duh"— Presentation transcript:

1 Factored Language Models EE517 Presentation April 19, 2005 Kevin Duh (duh@ee.washington.edu)

2 Factored Language Models1 Outline 1.Motivation 2.Factored Word Representation 3.Generalized Parallel Backoff 4.Model Selection Problem 5.Applications 6.Tools

3 Factored Language Models2 Word-based Language Models Standard word-based language models How to get robust n-gram estimates ( )? Smoothing E.g. Kneser-Ney, Good-Turing Class-based language models

4 Factored Language Models3 Limitation of Word-based Language Models Words are inseparable whole units. E.g. “book” and “books” are distinct vocabulary units Especially problematic in morphologically-rich languages: E.g. Arabic, Finnish, Russian, Turkish Many unseen word contexts High out-of-vocabulary rate High perplexity Arabic k-t-b KitaabA book Kitaab-iyMy book Kitaabu-humTheir book KutubBooks

5 Factored Language Models4 Arabic Morphology root pattern LIVE + past + 1st-sg-past + part: “so I lived” -tufa- affixesparticles sakansakan ~5000 roots several hundred patterns dozens of affixes

6 Factored Language Models5 Vocabulary Growth - full word forms Source: K. Kirchhoff, et al., “Novel Approaches to Arabic Speech Recognition - Final Report from the JHU Summer Workshop 2002”, JHU Tech Report 2002

7 Factored Language Models6 Vocabulary Growth - stemmed words Source: K. Kirchhoff, et al., “Novel Approaches to Arabic Speech Recognition - Final Report from the JHU Summer Workshop 2002”, JHU Tech Report 2002

8 Factored Language Models7 Solution: Word as Factors Decompose words into “factors” (e.g. stems) Build language model over factors: P(w|factors) Two approaches for decomposition Linear [e.g. Geutner, 1995] Parallel [Kirchhoff et. al., JHU Workshop 2002] [Bilmes & Kirchhoff, NAACL/HLT 2003] WtWt W t-2 W t-1 StSt S t-2 S t-1 MtMt M t-2 M t-1 stemsuffixprefixsuffixstem

9 Factored Language Models8 Factored Word Representations Factors may be any word feature. Here we use morphological features: E.g. POS, stem, root, pattern, etc. WtWt W t-2 W t-1 StSt S t-2 S t-1 MtMt M t-2 M t-1

10 Factored Language Models9 Advantage of Factored Word Representations Main advantage: Allows robust estimation of probabilities (i.e. ) using backoff Word combinations in context may not be observed in training data, but factor combinations are Simultaneous class assignment Word Kutub (Books) Kitaab-iy (My book) Kitaabu-hum (Their book)

11 Factored Language Models10 Example Training sentence: “lAzim tiqra kutubiy bi sorca” (You have to read my books quickly) Test sentence: “lAzim tiqra kitAbiy bi sorca” (You have to read my book quickly) Count(tiqra, kitAbiy, bi) = 0 Count(tiqra, kutubiy, bi) > 0 Count(tiqra, ktb, bi) > 0 P(bi| kitAbiy, tiqra) can back off to P(bi | ktb, tiqra) to obtain more robust estimate. => this is better than P(bi |, tiqra)

12 Factored Language Models11 Language Model Backoff When n-gram count is low, use (n-1)-gram estimate Ensures more robust parameter estimation in sparse data: P(W t | W t-1 W t-2 ) P(W t ) P(W t | W t-1 ) P(W t | W t-1 W t-2 W t-3 ) Backoff path: Drop most distant word during backoff Word-based LM: Backoff graph: multiple backoff paths possible F | F 1 F 2 F 3 F | F 1 F 3 F | F 2 F | F 1 F 2 F | F 2 F 3 F | F 3 F | F 1 F Factored Language Model:

13 Factored Language Models12 Choosing Backoff Paths Four methods for choosing backoff path 1.Fixed path (a priori) 2.Choose path dynamically during training 3.Choose multiple paths dynamically during training and combine result (Generalized Parallel Backoff) 4.Constrained version of (2) or (3) F | F 1 F 2 F 3 F | F 1 F 3 F | F 2 F | F 1 F 2 F | F 2 F 3 F | F 3 F | F 1 F

14 Factored Language Models13 Generalized Backoff Katz Backoff: Generalized Backoff: g() can be any positive function, but some g() makes backoff weight computation difficult

15 Factored Language Models14 g() functions A priori fixed path: Dynamic path: Max counts: Dynamic path: Max normalized counts: Based on raw counts => Favors robust estimation Based on maximum likelihood => Favors statistical predictability

16 Factored Language Models15 Dynamically Choosing Backoff Paths During Training Choose backoff path based based on g() and statistics of the data W t | W t-1 S t-1 T t-1 W t | S t-1 T t-1 W t | W t-1 W t | W t-1 S t-1 W t | W t-1 T t-1 W t | T t-1 W t | S t-1 WtWt W t | W t-1 S t-1 T t-1 W t | W t-1 S t-1 W t | W t-1 S t-1 T t-1 W t | S t-1 W t | W t-1 S t-1 W t | W t-1 S t-1 T t-1 WtWt W t | S t-1 W t | W t-1 S t-1 W t | W t-1 S t-1 T t-1

17 Factored Language Models16 Multiple Backoff Paths: Generalized Parallel Backoff Choose multiple paths during training and combine probability estimates W t | W t-1 S t-1 T t-1 W t | S t-1 T t-1 W t | W t-1 S t-1 W t | W t-1 T t-1 W t | W t-1 S t-1 T t-1 W t | W t-1 S t-1 W t | W t-1 S t-1 T t-1 W t | W t-1 T t-1 Options for combination are: - average, sum, product, geometric mean, weighted mean

18 Factored Language Models17 Summary: Factored Language Models FACTORED LANGUAGE MODEL = Factored Word Representation + Generalized Backoff Factored Word Representation Allows rich feature set representation of words Generalized (Parallel) Backoff Enables robust estimation of models with many conditioning variables

19 Factored Language Models18 Model Selection Problem In n-grams, choose, eg. Bigram vs. trigram vs. 4gram => relatively easy search; just try each and note perplexity on development set In Factored LM, choose: Initial Conditioning Factors Backoff Graph Smoothing Options  Too many options; need automatic search  Tradeoff: Factored LM is more general, but harder to select a good model that fits data well.

20 Factored Language Models19 Example: a Factored LM Initial Conditioning Factors, Backoff Graph, and Smoothing parameters completely specify a Factored Language Model E.g. 3 factors total: W t | W t-1 S t-1 T t-1 W t | S t-1 T t-1 W t | W t-1 W t | W t-1 S t-1 W t | W t-1 T t-1 W t | T t-1 W t | S t-1 WtWt 0. Begin with full graph structure for 3 factors W t | W t-1 S t-1 1. Initial Factors specify start-node

21 Factored Language Models20 Example: a Factored LM Initial Conditioning Factors, Backoff Graph, and Smoothing parameters completely specify a Factored Language Model E.g. 3 factors total: W t | W t-1 W t | W t-1 S t-1 W t | S t-1 WtWt 3. Begin with subgraph obtained with new root node 4. Specify backoff graph: i.e. what backoff to use at each node W t | W t-1 W t | W t-1 S t-1 WtWt 5. Specify smoothing for each edge

22 Factored Language Models21 Applications for Factored LM Modeling of Arabic, Turkish, Finnish, German, and other morphologically-rich languages [Kirchhoff, et. al., JHU Summer Workshop 2002] [Duh & Kirchhoff, Coling 2004], [Vergyri, et. al., ICSLP 2004] Modeling of conversational speech [Ji & Bilmes, HLT 2004] Applied in Speech Recognition, Machine Translation General Factored LM tools can also be used to obtain various smoothed conditional probability tables for other applications outside of language modeling (e.g. tagging) More possibilities (factors can be anything!)

23 Factored Language Models22 To explore further… Factored Language Model is now part of the standard SRI Language Modeling Toolkit distribution (v.1.4.1) Thanks to Jeff Bilmes (UW) and Andreas Stolcke (SRI) Downloadable at: http://www.speech.sri.com/projects/srilm/

24 Factored Language Models23 fngram Tools fngram-count -factor-file my.flmspec -text train.txt fngram -factor-file my.flmspec -ppl test.txt train.txt: “Factored LM is fun” W-Factored:P-adj W-LM:P-noun W-is:P-verb W-fun:P-adj my.flmspec W: 2 W(-1) P(-1) my.count my.lm 3 W1,P1 W1 kndiscount gtmin 1 interpolate P1 P1 kndiscount gtmin 1 0 0 kndiscount gtmin 1

25 Factored Language Models24

26 Factored Language Models25 Turkish Language Model Newspaper text from web [Hakkani-Tür, 2000] Train: 400K tokens / Dev: 100K / Test: 90K Factors from morphological analyzer Word yararmanlak

27 Factored Language Models26 Turkish: Dev Set Perplexity 2593.8555.0556.4539.2-2.9 3534.9533.5497.1444.5-10.6 4534.8549.7566.5522.2-5.0 NgramWord- based LM Hand FLM Random FLM Genetic FLM  ppl(%) Factored Language Models found by Genetic Algorithms perform best Poor performance of high order Hand-FLM corresponds to difficulty manual search

28 Factored Language Models27 Turkish: Eval Set Perplexity 2609.8558.7525.5487.8-7.2 3545.4583.5509.8452.7-11.2 4543.9559.8574.6527.6-5.8 NgramWord- based LM Hand FLM Random FLM Genetic FLM  ppl(%) Dev Set results generalizes to Eval Set => Genetic Algorithms did not overfit Best models used Word, POS, Case, Root factors and parallel backoff

29 Factored Language Models28 Arabic Language Model LDC CallHome Conversational Egyptian Arabic speech transcripts Train: 170K words / Dev: 23K / Test: 18K Factors from morphological analyzer [LDC,1996], [Darwish, 2002] Word Il+dOr

30 Factored Language Models29 Arabic: Dev Set and Eval Set Perplexity NgramWord- based LM Hand FLM Random FLM Genetic FLM  ppl(%) 2229.9229.6229.9222.9-2.9 3229.3226.1230.3212.6-6.0 The best models used all available factors (Word, Stem, Root, Pattern, Morph), and various parallel backoffs NgramWordHandRandomGenetic  ppl(%) 2249.9230.1239.2223.6-2.8 3285.4217.1224.3206.2-5.0 Dev Set perplexities Eval Set perplexities

31 Factored Language Models30 Word Error Rate (WER) Results Dev Set StageWord LM Baseline Factored LM 157.356.2 2a54.852.7 2b54.352.5 353.952.1 Eval Set (eval 97) Word LM Baseline Factored LM 61.761.0 58.256.5 58.857.4 57.656.1 Factored language models gave 1.5% improvement in WER


Download ppt "Factored Language Models EE517 Presentation April 19, 2005 Kevin Duh"

Similar presentations


Ads by Google