Presentation is loading. Please wait.

Presentation is loading. Please wait.

Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May 19 2010, LREC.

Similar presentations


Presentation on theme: "Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May 19 2010, LREC."— Presentation transcript:

1 Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May 19 2010, LREC 2010 Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System 1

2 Objective 2  A feedback tool for detecting and correcting preposition errors  I wait  /for you. ( : omitted prep)  So I go to/  home quickly. ( : extraneous prep)  Adult give money at/on birthday. ( : selection error)  Why preposition errors?  Preposition usage is one of the most difficult aspects of English for non-native speakers  18% of sentences from ESL essays contain a preposition error (Dalgish, 1985)  8-10% of all prepositions in TOEFL essays are used incorrectly (Tetreault and Chodorow, 2008)

3 Diagnosing L2 Errors 3  Statistical modeling on large corpora. But what kind? 1. General corpora composed of well-edited texts by native speakers (“native speaker corpora”)  Currently dominant approach 2. Error-annotated learner corpora: consist of texts written by ESL learners  Our approach

4 Our Learner Corpus 4  Chungdahm English Learner Corpus  A collection of English essays written by Korean-speaking students of Chungdahm Institute, operated in S. Korea  130,754,000 words in 861, 481 essays, written on 1,545 prompts  Over 6.6 million error annotations in 4 categories:  grammar, strategy, style, substance  Non-exhaustive error marking (more on this later)

5 The Preposition Data Set 5  Our preposition data set  The 11 “preposition” types: NULL, about, at, by, for, from, in, of, on, to, with  represents 99% of student error tokens in data  Text set consists of 20.5 mil words  117,665 preposition errors  1,104,752 preposition non-errors  Preposition error rate as marked in the data: 9.6%

6 Method 6  Cast error correction as a classification problem  Train an 11-way Maximum Entropy classifier on preposition events extracted from the Chungdahm corpus  A preposition annotation is represented as ( s : student’s prep choice, c : correct preposition) where s and c range over: { NULL, about, at, by, for, from, in, of, on, to, with }  s≠c for prep errors; s=c for non-errors  A preposition event consists of:  Outcome (prediction target): c  Contextual features extracted from immediate contexts surrounding preposition tokens, including the student’s original preposition choice (i.e., s )

7 Preposition Context 7  Student prep choice + 3 words to left and right  MOD : Head of the phrase modified by the prep phrase  ARG : Noun argument of the preposition  Identified using Stanford Parser  Example text and annotation: Snow is falling there at the winter. -3 -2 -1 s +1 +2 +3 MOD ARG :

8 Event Representation 8  Represented as an event:  Outcome: in  Features: (24 total) namevalue sat wd-1there wd+1the MODfalling ARGwinter MOD_ARGfalling_winter MOD_s_ARGfalling_at_winter 3GRAMthere_at_the 5GRAMfalling_there_at_the_winter...

9 Training and Testing 9  Training set: 978,000 events  The rest is set aside for evaluation and development  Creating an evaluation set for testing  Error annotation in Chungdahm corpus is not exhaustive:  Many student errors are left unmarked by tutors  This necessitates creating a re-annotated evaluation set  1,000 preposition contexts annotated by 3 trained annotators  Inter-annotator agreement (0.860~0.910), kappa (0.662~0.804)

10 Evaluation Results 10  11-way classification - works as error correction (multi-outcome decision) model - can be backed-off to an error detection (binary decision) model  Omission errors (I wait  /for you. ) * Error detection is trivial for this type  Extraneous prep errors (So I go to/  home quickly.)  Selection errors (Adult give money at/on birthday.) accuracy error correction0.833 precisionrecall error correction0.870.043 detection only1.000.049 precisionrecall error correction0.8170.132 detection only0.9330.148

11 Related Work 11  Chodorow et al. (2007)  Error detection model targeting 34 prepositions  Trained on San Jose Mercury news + Lexile data  0.88 (precision) 0.16 (recall) for detecting selection errors  Gamon et al. (2008)  Error detection and correction model of 13 prepositions  One classifier to determine whether a preposition/article should be present; another for correct choice; an additional filter  Trained on MS Encarta data, tested on Chinese learner writing  80% precision; recall not reported  Izumi et al. (2003, 2004)  Trained on Standard Speaking Test Corpus (Japanese)  56 speakers, 6,216 sentences  25% precision and 7% recall on 13 grammatical error types

12 Comparison: Native-Corpus-Trained Models 12  Question: Will models trained on native-speaker- produced texts outperform our model?  The advantage of native corpora: They are plentiful.  We allowed these models to have a larger training size.  Experimental setup:  Build models on native corpora, using varying training set sizes (1mil – 5mil)  Data: the Lexile Corpus, 7 th and 8 th grade reading levels  A comparable feature set was employed

13 Learner Model vs. Native Models 13  Testing results on learner data (replacement errors only):  Learner model outperforms all native models  Native models: performance gain with larger size insignificant beyond 2-3mil point error correctionerror detection only precisionrecallprecisionrecall Learner (about 1 mil) 0.8170.1320.9330.148 N-1mil0.4160.1060.5360.132 N-2mil0.4160.1160.5860.142 N-3mil0.4530.0990.5940.126 N-4mil0.4620.1250.5830.153 N-5mil0.4840.1210.6050.147

14 What Does This Prove? 14  Are the native models flawed? Bad feature set?  No. In-set testing (against held-out native text) shows performance levels comparable to those in published studies  Could some of the performance gaps be due to genre differences?  Highly likely. However, 7 th -8 th grade reading materials were the closest match we could find to student essays.  In sum: Native models’ advantage of larger training size does not outweigh those of the learner model’s: genre/text similarity and error-annotation

15 Discussion: Learner language vs. native corpora 15  Modeling on native corpora:  Produces a one-size-fits-all model of “native” English  More generic & universally applicable?  Modeling on a learner corpus:  Produces a model specific to the particular learner language  Can it be applied to the language of other learner groups?  ex. French citizens? Japanese-speaking English learners?  Combining two approaches:  A system with specific models for different L1 background  Plus a back-off “generic” model, built on native corpora

16 Discussion: The Problem of Partial Error Annotation 16  Partial error annotation problem:  57% of replacement errors and 85% of extraneous prepositions are unchecked by Chungdahm tutors  Training data includes conflicting evidence.  Our model’s low recall/high precision are impacted by it  Model assumes a lower-than-true error rate  Model has to reconcile between conflicting sets of evidence  When the model does flag an error, it does so with high confidence and accuracy  Solution? Bootstrapping, relabeling of unannotated errors

17 Conclusions 17  As language instruction turns digital, more and more (partially) error-annotated learner corpora like the Chungdahm corpus will become available  Building a direct model of L2 errors, whenever available, offers an advantage over models based on native corpora, despite the partial annotation problem (if any)  Exhaustive annotation is not necessary for learner-corpus- trained models to outperform standard native-text-trained models with much larger training data set


Download ppt "Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May 19 2010, LREC."

Similar presentations


Ads by Google