Presentation on theme: "1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University."— Presentation transcript:
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University email@example.com
2 The Needs to Provide Feedback on Second Language Writing More and more tests ask ESL/EFL students to demonstrate their writing abilities SLA Researchers would suggest that learners would need more practices and corrective feedback. However, who can provide them useful feedback on meaning and forms?
3 Use the Existing Grammar Checkers? Teachers are the best feedback providers. However, so many essays to correct…. Microsoft grammar checker General impressions from ESL/EFL learners= it is NOT very useful. The two new commercial packages: Vantage MyAccess and ETS Criterion The feedback quality for ESL learners are not so accurate and comprehensive. (perhaps because it does not target at any L1 group and it is mainly targeted at native speakers)
4 A More Through Review on E-rater- ETS Criterion Japanese college researcher Junko Otoshi (2005) from Ritsumeikan University Use 28 Japanese adult students’ TOEFL writing essays to explore what Criterion can and cannot do with regard to providing feedback on the essays. Criterion’s critique function was compared with a human instructor’s error feedback focusing on five error categories: verbs, word choice, nouns, articles, and sentence structures.
5 Errors Marked by Criterion and Human Instructors (Means) Error Type Criterion Human Instructors Verbs 0.47 0.84 Nouns 0.00 0.94 Articles 0.07 2.00 Word Choice 0.11 2.32 Sentence Structure 0.32 6.31
6 Rather Disappointing Results and Possible Reasons The results revealed that Criterion experienced difficulties in detecting errors in all of the five categories. Does it aim for higher accuracy and has lower recall? More conservative approach The size the reference corpus? Another program MyAccess has similar problems, though the general impression from review reports was that they can detect more errors.
7 Trying to Combine Different Approaches: Plan A and B for Grammar Checkers With the funding from NSC in Taiwan, we planned to develop two grammar checkers. Different approaches= parser-rules-statistics Plan A: we will use the ngram to help to identify the errors Plan B: we will use the rule-based grammar checker to identify errors. If possible, plan A and B will be merged and it should be able to capture more errors. In this paper, we will only discuss the plan A.
8 What ’ s the Ngram (statistical) Checker? We will not write specific grammar rules. The computer helps to calculate all the possible combinations of word strings (2- word and 3-word) in a very large native corpus. Language models building. All these saved to a large database. Then when students write and submit an essay to the ngram checker, the system can quickly detect the word strings that do not exist in the native corpus.
9 Ngram-based Checker: advantages The key idea is simple but powerful No need to write rule More robust in detecting errors. Large and suitable corpus might make this very useful. (ETS, they used 30-million news)
10 The Procedure of Developing an Ngram Checker (corpora and tools) 1. Find suitable and large corpus (e.g BNC; wikipedia, and Google) 2. Extract the ngrams (NLP tools SRI tool ) 3. Build a large ngram database 4. Develop and test different highlighting methods 5. Highlight the possibly problematic ngrams in learners’ writing
16 Evaluate the Checker Performances: Any Standard Way of Evaluating Checkers? What kind of errors should be used to test the grammar checker? Fair assessment- same set of sentences. How many sentences? Many different categories and errors Lexical factors. NLP researchers: F-measure and precision and recall
17 Test with CLEC Corpus from China The size of the Chinese learners of English Corpus. 1 million error-tagged learner corpus. With about 60 error types. We decided to single out some sentences (10 sentences) from the learner corpus and then throw them into our ngram checkers.
28 The Strengths of NTNU Ngram Checkers: Ngram is good at detecting errors in the “local” or adjacent domains. It can indeed find many errors in CLEC. Spellings Word forms Verb phrases Noun phrases Adj phrases Collocations
29 The Weakness of Ngram Checkers It failed to catch the followings effectively: Tense errors Conjuncts errors Fragments Pronoun errors Preposition errors The run on sentences The missing words
30 The Poor Performance of Ngram Checkers for Tense and Conjuncts
31 Rule-based Checker can Perform Better for Some Nonlocal Errors
33 BUT Ngram Performed Better for the Local Errors I have some book. The informations are so rich. These researches are excellent. He is new friend. He cutted his finger. He enjoys to eat. He wants jumping into the river. I cannot decided about this. These reason are too simple. I has three answers.
34 What Can We Do to Improve Feedback from Ngram Checkers? Only Highlighting and No detailed feedback?? We are facing a bigger challenge. How to recommend correct usage? How we can find the correct examples for students? If students only see the errors highlighted, they might still fail to correct the errors. For agreement errors, tense errors, confusing words, Students might be able to self-correct. However, if there are some tense errors, collocations errors or preposition errors, learners might need more specific suggestions.
35 Find the Proper Collocates: increase and improve life
36 Confusion between accept and receive your apology
37 Future Directions for Improvement 1. Test with many different errors and find the strengths and limitations of Ngram-based checkers and Rule-based checkers 2. Use Tagged learner corpus to find the error patterns from learner languages 3. Feedback can be added in for ngram-based Checkers on the major error patterns 4. Better integration of the rule- based system and ngram checkers
38 Thanks for your attention Questions and Discussions firstname.lastname@example.org National Taiwan Normal University