Michael Gamon, Chris Brockett, William B

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

Introduction.  “a technique that enables the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognize errors.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
User Input and Interactions on Microsoft Research ESL Assistant Claudia Leacock, Butler Hill Group Michael Gamon, Microsoft Research Chris Brockett, Microsoft.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
Funded under the EU ICT Policy Support Programme Automated Solutions for Patent Translation John Tinsley Project PLuTO WIPO Symposium of.
Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Search Engines and Information Retrieval Chapter 1.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
Higher or Lower? Get your answer right and you earn a point.
What do these mean? Your time is up Ready for anything (Red E)
Measuring Monolinguality
Nouns Nouns Verbs Verbs Verbs Verbs Plurals Plurals Categories Side Tabs for Interactive Language Notebooks: Page 1 Pronouns Pronouns Nouns Nouns.
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Learning Usage of English KWICly with WebLEAP/DSR
Text Based Information Retrieval
Cross Domain Distribution Adaptation via Kernel Mapping
GLoCALL & PCBET 2017 Joint Conference, 7-9 September 2017 at Universiti Teknologi Brunei, Brunei Darussalam, Presented at Room 1, 11:00-11:30. Effect of.
NLP Assignments for Undergraduates (1)
Differences in comprehension strategies for discourse understanding by native Chinese and Korean speakers learning Japanese Katsuo Tamaoka Graduate.
A CORPUS-BASED STUDY OF COLLOCATIONS OF HIGH-FREQUENCY VERB —— MAKE
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Extracting Semantic Concept Relations
Annotating ESL Errors: Challenges and Rewards
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Toward Better Understanding
The CoNLL-2014 Shared Task on Grammatical Error Correction
Artificial Intelligence applied to IPC and Nice classifications
iSRD Spam Review Detection with Imbalanced Data Distributions
Hong Kong English in Students’ Writing
CSCI 5832 Natural Language Processing
Special Topics in Text Mining
Realities, Challenges, and Promises - Promoting the Next Generation of English Teachers in China Jun Liu May 18, 2007 Beijing, China.
Statistical n-gram David ling.
Text Mining & Natural Language Processing
CS246: Information Retrieval
University of Illinois System in HOO Text Correction Shared Task
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
English project More detail and the data collection system
Preposition error correction using Graph Convolutional Networks
Extracting Why Text Segment from Web Based on Grammar-gram
Active AI Projects at WIPO
Getting Started with Microsoft Azure Machine Learning
Presentation transcript:

The MSR ESL Assistant: Detecting and correcting non-native errors in English Michael Gamon, Chris Brockett, William B. Dolan, Jianfeng Gao, Dmitriy Belenko (Microsoft Research), Alexandre Klementiev (University of Illinois at Urbana Champaign), Claudia Leacock (Butler Hill Group)

Making NLP useful

Overview Motivation Part I: The system Error statistics Different solutions for different errors Machine learned classifiers for preposition and determiner errors Adding a language model and web-based examples Part II: Evaluation on native and non-native data Part III: Usage and interactions

Motivation: The Story of the Disappearing and Reappearing Slide 750M people use English as a second or foreign language (vs. 375M as first language) 74% of use of English is between non-native speakers As many as 300M people study English in China

Error statistics Previous studies: Articles and prepositions account for 20% - 50% of ESL errors Prepositions are difficult for learners with various L1 backgrounds

Error statistics NICT Japanese Learners of English corpus: 26.6% of errors are determiner related 10% of errors are preposition related CLEC Chinese Learners’ Corpus: 10% of errors determiner and number related 2% preposition related, 5% collocation errors (which often involve prepositional collocations)

Most frequent errors made by East Asian non-native speakers Preposition presence and choice: Finally, the pollution on the world is serious. Definite and indefinite determiner presence and choice: We should think whether we have ability to do it well. Noun pluralization: So other works couldn't be done in adequate times. Gerund/infinitive confusion: So, money is also important in improve people's spirit. Auxiliary verb presence and choice: The fire will break out, it can do harmful to people. Over-regularized verb inflection: It was builded in 1995. Adjective/noun confusion: There was a wonderful women volleyball match between Chinese team and Cuba team. Word order (adjective sequences and nominal compounds): A pop British band called "Spice Girl" has sung a song.

Different errors – different solutions Prepositions and articles: much contextual information needed Over-regularized verb morphology: local information is enough Noun number: local information (mass noun, quantifier etc) is enough Machine learned approaches for (1), simple heuristics for (2) and (3). Total number of error modules: 4 machine-learned modules, 19 heuristic models

Modeling preposition and determiner errors What data? Domain Sentences Encarta encyclopedia 487,281 Reuters newswire 567,394 UN proceedings (Hansard) 500,000 Europarl Web scraped, using an algorithm similar to STRAND (Resnik and Smith 2003) Total 2,554,675

Modeling preposition and determiner errors Preprocessing: tokenization, POStagging Heuristic algorithm (based on POS tags): find left edges of NPs (potential sites for prepositions and articles) For each potential site of a preposition or article: Target feature 1: preposition/article present or absent Target feature 2: choice of preposition/article (if present) Contextual features (POS tags to the left/right, tokens to the left/right) Maximum Entropy classifier

Modeling preposition and determiner errors Training data: 2.5M sentences: Encarta, Reuters, UN, EU, web scraped Classifier Training cases Article presence/absence 11.9M Article choice 4.3M Preposition presence/absence 16.1M Preposition choice 6.5M

Adding a language model LM accuracy alone is not sufficient: 58.36%

Adding web search Observation: Non-native speakers often use the web to validate word choice

Show suggestions and originals in context

Evaluation (1): native text (correct usage of prepositions and determiners) Splitting the original training data into 70% training, 30% test Note: classification is split into two questions: Should there be a determiner/preposition? If yes, which one should it be? (Prepositions: limiting the set to 12 choices that are common in errors: about, as, at, by, for, from, in, like, of, on, since, to, with, "other“)

Articles: results on native text Presence/absence Choice model Combined Accuracy 89.94% 89.66% 86.76% Baseline 64.04% (no article) 77.73% (definite) 58.91% Presence/absence model Precision Recall Presence 87.89% 83.54% Absence 91.01% 93.54% Choice model Precision Recall the 91.48% 95.60% a/an 81.77% 68.94%

Prepositions: results on native text Presence/absence Choice model Combined Accuracy 88.57% 66.23% 76.77% Baseline 59.57% (no preposition) 27.07% (of) 42.00% Presence/absence Precision Recall Presence 86.76% 84.66% Absence 89.75% 91.23%

Results on individual prepositions Choice model Precision Recall as 77.28% 62.77% on 68.17% 56.69% of 71.91% 87.54% about 60.17% 35.12% to 67.92% 64.48% by 63.37% 52.62% at 64.92% 52.85% in 61.81% 69.87% since 62.62% 20.67% with 63.45% 47.94% from 59.58% 38.36% other 56.97% 55.14% for 58.46% 47.91%

Evaluation(2): Human evaluation Spellchecked Chinese Learners’ Corpus (CLEC) Test set scraped from the web User data

Spellchecked Chinese Learners’ Corpus (CLEC) 1 million words of English compositions collected from Chinese learners of English in China with differing levels of proficiency: senior secondary school students English-major university students non-English-major university students

Web scraped data collected by a vendor for MSR Scraped from 489 personal web pages and blogs of non-native speakers/students of English, of Korean, Chinese, or Japanese L1 background 6746 sentences, 1k selected randomly for our evaluation Education level ranges from high school to graduate school, professionals are also included Gender balanced

Intermission: Pie charts

Prepositions

Articles

Broader categories CLEC Web scraped adj related verb related noun related prep related CLEC Web scraped

Usage of the prototype and evaluation of user data

Page views per day Live Translator snafu Beijing Olympics

User location country visits percentage China 51,285 26.80% United States 28,916 15.10% Taiwan 25,753 13.40% Korea - South 12,934 6.80% Hong Kong 8,826 4.60% Brazil 4,648 2.40% Canada 3,917 2.00% Germany 3,077 1.60% United Kingdom 2,928 1.50% Japan 2,581 1.30% Italy 2,579 Spain 2,557 Russian Federation 2,448 Saudi Arabia 2.021 1.10%

Users and Sessions

Repeat users (2) Highlight the 4 or more category

Return visits

Collected data

User interactions

84% of squiggles are examined by the user This is for all frequent users that are being evaluated (eg, not include random sentences). Also not include Thesuarus

Are users accepting the right suggestions? suggested accepted

In summary Large market for ESL proofing tools Detecting and correcting non-native errors is a non-trivial and interesting research problem We may already be at a point where the technology starts to be useful

Some open questions How does the accuracy of POStagging influence the accuracy of the overall system? How can we best leverage the user behavior as a supervision signal?

Some ideas Using web result counts directly as an LM approximation Using web result counts as (part of a) supervision signal for ML Combining more sources of evidence: LMs trained on different data sets etc Build one single model, including LM scores Active learning to optimize thresholds