Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Similar presentations


Presentation on theme: "Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William."— Presentation transcript:

1 Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko, Lucy Vanderwende Reporter: Chia-Ying Lee Advisor: Hsin-Hsi Chen Microsoft Research & University of Illinois IJCNLP 2008

2 Introduction About 750M (74%) people use English as a second language (Crystal 1997) Non-native writer encountered some special problem. (Ex: prepositions 介係詞 ) Challenge: Writing errors often present a semantic dimension(Ex: at school 指地點, in school 指時間 ) 2

3 Target Error Type 1. Preposition 介係詞 presence and choice: In the other hand,... (On the other hand...) 2. Definite and indefinite determiner presence and choice: I am teacher... (am a teacher) 3. Gerund 動名詞 /infinitive 不定詞 confusion: I am interesting in this book. (interested in) 4. Auxiliary verb presence and choice: 從屬動詞 My teacher does is a good teacher (my teacher is...) 5. Over-regularized verb inflection: I writed a letter (wrote) 6. Adjective/noun confusion: This is a China book (Chinese book) 7. Word order (adjective sequences and nominal compounds): I am a student of university (university student) 8. Noun pluralization: They have many knowledges (much knowledge) 3

4 Problem Definition Present a modular system for detection and correction of errors made by non- native writers. Focus on preposition and determiner related problem. 4

5 Related Work Turner and Charniak (2007) utilize a language model based on a statistical parser for determiner and preposition selection De Felice and Pulman (2007) utilize a set of sophisticated syntactic and semantic analysis features to predict 5 common English prepositions Han et al. (2004, 2006) use a maximum entropy classifier to propose article corrections Izumi et al. (2003) and Chodorow et al. (2007) present techniques of automatic preposition choice modeling 5

6 System Description 0. Preprocessing Tokenized and POS tagged 1. Suggestion Provider (SP) Detection and correction 2. Language Model (LM) Delete the suggestions whose score is lower than original 3. Example Provider (EP) Query the web for exemplary sentences 6

7 Suggestion Provider(1/3) Classifiers : Presence/absence or pa classifier ex: p(article + teacher) = 0.54 Choice or ch classifier ex: p( the) = 0.04 p(a/an) = 0.96 Potential insertion sites are determined heuristically from the sequence of POS tags 7

8 Suggestion Provider(2/3) Features: ( ±6 tokens) Relative position Token string POS tags Example: 0/I/PRP 1/am/VBP 2/teacher/NN 3/from/IN 4/Korea/NNP 5/./. Decision tree classifiers (WinMine toolkit Chickering 2002) Better than linear SVM 8

9 Suggestion Provider(3/3) Data set: English Encarta encyclopedia (560k sentences) A random set of 1M sentences from a Reuters news data set. Preposition from the NICT Japanese Learners of English corpus : about, as, at, by, for, from, in, like, of, on, since, to, with, than, “other“ 9

10 Language Model 5-gram model trained on the English Gigaword corpus (LDC2005T12) 120K-word vocabulary 54 million bigrams, 338 million trigrams, 801 million 4-grams and 12 billion 5-grams. Use interpolated Kneser-Ney smoothing (Kneser and Ney 1995) without count cutoff Score: I am teacher from Korea. score = 0.19 I am a teacher from Korea. score = 0.60 10

11 Example Provider (1/2) Web Search String query in a small window Ranking rule: In the same sentence Sentence length Context overlap 11

12 Example Provider (2/2) Original: I want to travel Disneyland in March. Suggestion: I want to travel to Disneyland in March. Top 3 examples: 1. Timothy's wish was to travel to Disneyland in California. 2. Should you travel to Disneyland in California or to Disney World in Florida? 3. The tourists who travel to Disneyland in California can either choose to stay in Disney resorts or in the hotel for Disneyland vacations. 12

13 Evaluation (1/5) Suggestion provider Determiner choice Preposition choice Language model Human evaluation 70% for training; 30%for testing Combined accuracy: 13

14 Evaluation (2/5) Suggestion provider Determiner choice Baseline:69.9% Choosing the most frequent class label none State of the art Turner and Charniak (Penn Tree Bank) : 86.74% 14

15 Evaluation (3/5) Suggestion provider Preposition choice Baseline : 28.94% Using no preposition 15

16 Evaluation (4/5) Language Model Reduced the number of preposition corrections by 66.8% and the determiner corrections by 50.7% Increase precision dramatically For the accuracy of preposition suggestions LM score + classifier probability : 62.32% LM score alone: 58.36% 16

17 Evaluation (5/5) Human evaluation 17 CLEC: Chinese Learners of English Corpus (Gui and Yang 2003)

18 Conclusion and Future Work Successfully combining contextual speller based methods with language model scoring and providing web-based examples. The system can work even in extremely noisy text with reasonable accuracy Future Work : Using web counts to build a learned ranker that combines information from language model and classifiers 18

19 Thank you! 19 買敏順找敏順! 敏順讓您呼吸順暢 輕鬆舒爽


Download ppt "Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William."

Similar presentations


Ads by Google