Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May 19 2010, LREC.

Slides:

Advertisements

Similar presentations

Report Writing.

Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Chapter 4 Key Concepts.

© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

The Ups and Downs of Preposition Error Detection in ESL Writing Joel Tetreault[Educational Testing Service]

HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.

1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.

Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.

Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.

Annie Louis University of Pennsylvania Derrick Higgins Educational Testing Service 1.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.

CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.

Introduction to Machine Learning Approach Lecture 5.

Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.

Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

The Ups and Downs of Preposition Error Detection in ESL Writing Joel Tetreault[Educational Testing Service] Martin Chodorow[Hunter College of CUNY]

Preposition Errors in ESL Writings Mohammad Moradi KOWSAR INSTITUTE.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English Ryo Nagata et al. Hyogo University of Teacher Education ACL 2006.

Natural Language Processing

Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

The Writing Section of the SAT Strategies for the Multiple Choice Questions.

Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.

On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

A Language Independent Method for Question Classification COLING 2004.

Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Published materials Authentic materials

College Admissions Testing: What You Need to Know.

Error Correction: For Dummies? Ellen Pratt, PhD. UPR Mayaguez.

Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.

Automated Suggestions for Miscollocations the Fourth Workshop on Innovative Use of NLP for Building Educational Applications Authors:Anne Li-E Liu, David.

Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013 Detecting Missing Hyphens in Learner Text Aoife Cahill *, Martin Chodorow.

Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.

Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Fita Ariyana Rombel 7 (Thursday 9 am).

Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )

Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artiﬁcial Intelligence Laboratory, MIT, Cambridge ACL 2008.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Automatically Labeled Data Generation for Large Scale Event Extraction

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Michael Gamon, Chris Brockett, William B

Annotating ESL Errors: Challenges and Rewards

The CoNLL-2014 Shared Task on Grammatical Error Correction

Presentation transcript:

Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC 2010 Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System 1

Objective 2  A feedback tool for detecting and correcting preposition errors  I wait  /for you. ( : omitted prep)  So I go to/  home quickly. ( : extraneous prep)  Adult give money at/on birthday. ( : selection error)  Why preposition errors?  Preposition usage is one of the most difficult aspects of English for non-native speakers  18% of sentences from ESL essays contain a preposition error (Dalgish, 1985)  8-10% of all prepositions in TOEFL essays are used incorrectly (Tetreault and Chodorow, 2008)

Diagnosing L2 Errors 3  Statistical modeling on large corpora. But what kind? 1. General corpora composed of well-edited texts by native speakers (“native speaker corpora”)  Currently dominant approach 2. Error-annotated learner corpora: consist of texts written by ESL learners  Our approach

Our Learner Corpus 4  Chungdahm English Learner Corpus  A collection of English essays written by Korean-speaking students of Chungdahm Institute, operated in S. Korea  130,754,000 words in 861, 481 essays, written on 1,545 prompts  Over 6.6 million error annotations in 4 categories:  grammar, strategy, style, substance  Non-exhaustive error marking (more on this later)

The Preposition Data Set 5  Our preposition data set  The 11 “preposition” types: NULL, about, at, by, for, from, in, of, on, to, with  represents 99% of student error tokens in data  Text set consists of 20.5 mil words  117,665 preposition errors  1,104,752 preposition non-errors  Preposition error rate as marked in the data: 9.6%

Method 6  Cast error correction as a classification problem  Train an 11-way Maximum Entropy classifier on preposition events extracted from the Chungdahm corpus  A preposition annotation is represented as ( s : student’s prep choice, c : correct preposition) where s and c range over: { NULL, about, at, by, for, from, in, of, on, to, with }  s≠c for prep errors; s=c for non-errors  A preposition event consists of:  Outcome (prediction target): c  Contextual features extracted from immediate contexts surrounding preposition tokens, including the student’s original preposition choice (i.e., s )

Preposition Context 7  Student prep choice + 3 words to left and right  MOD : Head of the phrase modified by the prep phrase  ARG : Noun argument of the preposition  Identified using Stanford Parser  Example text and annotation: Snow is falling there at the winter s MOD ARG :

Event Representation 8  Represented as an event:  Outcome: in  Features: (24 total) namevalue sat wd-1there wd+1the MODfalling ARGwinter MOD_ARGfalling_winter MOD_s_ARGfalling_at_winter 3GRAMthere_at_the 5GRAMfalling_there_at_the_winter...

Training and Testing 9  Training set: 978,000 events  The rest is set aside for evaluation and development  Creating an evaluation set for testing  Error annotation in Chungdahm corpus is not exhaustive:  Many student errors are left unmarked by tutors  This necessitates creating a re-annotated evaluation set  1,000 preposition contexts annotated by 3 trained annotators  Inter-annotator agreement (0.860~0.910), kappa (0.662~0.804)

Evaluation Results 10  11-way classification - works as error correction (multi-outcome decision) model - can be backed-off to an error detection (binary decision) model  Omission errors (I wait  /for you. ) * Error detection is trivial for this type  Extraneous prep errors (So I go to/  home quickly.)  Selection errors (Adult give money at/on birthday.) accuracy error correction0.833 precisionrecall error correction detection only precisionrecall error correction detection only

Related Work 11  Chodorow et al. (2007)  Error detection model targeting 34 prepositions  Trained on San Jose Mercury news + Lexile data  0.88 (precision) 0.16 (recall) for detecting selection errors  Gamon et al. (2008)  Error detection and correction model of 13 prepositions  One classifier to determine whether a preposition/article should be present; another for correct choice; an additional filter  Trained on MS Encarta data, tested on Chinese learner writing  80% precision; recall not reported  Izumi et al. (2003, 2004)  Trained on Standard Speaking Test Corpus (Japanese)  56 speakers, 6,216 sentences  25% precision and 7% recall on 13 grammatical error types

Comparison: Native-Corpus-Trained Models 12  Question: Will models trained on native-speaker- produced texts outperform our model?  The advantage of native corpora: They are plentiful.  We allowed these models to have a larger training size.  Experimental setup:  Build models on native corpora, using varying training set sizes (1mil – 5mil)  Data: the Lexile Corpus, 7 th and 8 th grade reading levels  A comparable feature set was employed

Learner Model vs. Native Models 13  Testing results on learner data (replacement errors only):  Learner model outperforms all native models  Native models: performance gain with larger size insignificant beyond 2-3mil point error correctionerror detection only precisionrecallprecisionrecall Learner (about 1 mil) N-1mil N-2mil N-3mil N-4mil N-5mil

What Does This Prove? 14  Are the native models flawed? Bad feature set?  No. In-set testing (against held-out native text) shows performance levels comparable to those in published studies  Could some of the performance gaps be due to genre differences?  Highly likely. However, 7 th -8 th grade reading materials were the closest match we could find to student essays.  In sum: Native models’ advantage of larger training size does not outweigh those of the learner model’s: genre/text similarity and error-annotation

Discussion: Learner language vs. native corpora 15  Modeling on native corpora:  Produces a one-size-fits-all model of “native” English  More generic & universally applicable?  Modeling on a learner corpus:  Produces a model specific to the particular learner language  Can it be applied to the language of other learner groups?  ex. French citizens? Japanese-speaking English learners?  Combining two approaches:  A system with specific models for different L1 background  Plus a back-off “generic” model, built on native corpora

Discussion: The Problem of Partial Error Annotation 16  Partial error annotation problem:  57% of replacement errors and 85% of extraneous prepositions are unchecked by Chungdahm tutors  Training data includes conflicting evidence.  Our model’s low recall/high precision are impacted by it  Model assumes a lower-than-true error rate  Model has to reconcile between conflicting sets of evidence  When the model does flag an error, it does so with high confidence and accuracy  Solution? Bootstrapping, relabeling of unannotated errors

Conclusions 17  As language instruction turns digital, more and more (partially) error-annotated learner corpora like the Chungdahm corpus will become available  Building a direct model of L2 errors, whenever available, offers an advantage over models based on native corpora, despite the partial annotation problem (if any)  Exhaustive annotation is not necessary for learner-corpus- trained models to outperform standard native-text-trained models with much larger training data set