1896192019872006 Automated Writing Evaluation(AWE): Past, Present and Prospect Dr. Li Zhang ( 张荔） Shanghai Jiao Tong University Shanghai, China.

Slides:

Advertisements

Similar presentations

Issues in developing narrative structures Postgraduate writing, seminar 7 John Morgan.

Advertisements

Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.

Electronic Essay Graders Jay Lubomirski.  How electronic essay graders evaluate writing samples  Comparing the electronic graders to the human graders.

Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.

A Quick Glance at Formative and Summative Technology Assessment Tools Laura Gottardo Grand Canyon University- TEC 546 7/21/2011.

Qualitative Grading Notes compiled by Mary D’Alleva January 18 th, 2005 Office of Faculty Development.

1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.

Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.

Latent Semantic Analysis

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

The origins of language curriculum development

NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008 Automatic.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

TSL 3123 LANGUAGE ASSESSMENT

Byrne, Tang, Tang, Tranduc Web 2.0 Electronic Teaching and Tutoring Assistant (eTA) Products and their Distribution Potential.

Linguistic Demands of Preschool Cognitive Assessments Glenna Bieno, Megan Eparvier, Anne Kulinski Faculty Mentor: Mary Beth Tusing Method We employed three.

Ursula Wingate Department of Education and Professional Studies Embedding writing instruction into subject teaching – how to convince subject teachers?

Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 

Second Language Writers and The Machine Scoring of Essays Deborah Crusan Wright State University.

Differentiating Instruction Using Lexile Measures and OSLIS Developing Targets for Student Success Module I.

Automated Essay Evaluation Martin Angert Rachel Drossman.

Ian Lucas Executive Director ETS Europe UK CRITERION ® Online Writing Evaluation.

November 18 th, 2014 ELA Review and Adoption Committee.

Summit IntelliMetric MY Access! ® A Web-delivered Assessment And Instructional Writing Tool Lee, Wei-Jiin Summit IntelliMetric GLoCALL 2007 Vietnam 2 –

Practice with Text Complexity History/Social Studies.

EVALUATION Evaluation is an integral part of teaching variables. Various tests are conducted for evaluation.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

ELA SCHOOL TEAM SESSION Welcome to EEA, 2012! 10/2/2015MSDE1.

The Developmental Reading & English Placement Test

CLC reading program Nguyen Thi Thu Trang. In-class activities Assignment Assessment Add your text in here Reading program Objectives Contents.

Web-Based Writing/Scoring Options August 4, 2005 Mary Hall Classroom Teacher, Warwick HS Janet Dubble Technology Coordinator, IU 13.

Chapter 3: ESL (ELL) Assessment Assessing Academic Reading.

Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.

Closing the Opportunity to Learn Gap for Students with Disabilities Cross-State Research Results Opportunity To Learn– Instructional Alignment Least Restrictive.

Tamil Summary Generation for a Cricket Match

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

The Essential Skill of Writing An Introductory Training for High School Teachers Penny Plavala, Multnomah ESD Using the Writing Scoring Guide.

1 Automatic Essay Scoring is Here and Now Online Welcome to CIT S234 Gary Greer University of Houston Downtown & Michelle Overstreet The College Board.

Lesson Plan Project by Jill Keeve. Goal/Objective Goal : Students will use a reading excerpt to explore alternate background information on conic sections.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.

The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

© 2015 The College Board The Redesigned SAT/PSAT Key Changes.

Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

The Redesigned SAT January 20, About the Redesigned SAT.

SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

Need Analysis. Presented by: Maryam AL-Oufi Supervised by: Prof. Antar Abdallah.

Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.

By: Niraj Kumar Automatic Essay Grading Novelty Detection 1. 2.

An Institutional Writing Assessment Project Dr. Loraine Phillips Texas A&M University Dr. Yan Zhang University of Maryland University College October 2010.

May I be of Service? English 1101 Academic Service Learning Project.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

English Reading Guidance with Learning Portfolio Analysis Ting-Ting Wu Graduate School of Technological and Vocational Education, National Yunlin University.

Automatic Writing Evaluation

Best Practices in Implementing the 2010 ELA Standards

Dr Anie Attan 26 April 2017 Language Academy UTMJB

Criterial features If you have examples of language use by learners (differentiated by L1 etc.) at different levels, you can use that to find the criterial.

Automated Essay Scoring The IntelliMetric® Way

By: ODUNTAN, ODUNAYO ESTHER AAA Ph.D Qualifying Examination

IB Assessments CRITERION!!!.

Automatic Fluency Assessment

Blogging in the Classroom

Lily Zhang Shantou University

Improving academic performance Building language skills Developing critical thinking Expressing ideas and opinions Ask the audience: What are the core.

Presentation transcript:

Automated Writing Evaluation(AWE): Past, Present and Prospect Dr. Li Zhang ( 张荔） Shanghai Jiao Tong University Shanghai, China

Outline Introduction of kinds of AWE. Introduction of JUKU - AWE developed in China. Prediction of future development of AWE

Some most widely used AWE systems PEG (Project Essay Grader) IEA (Intelligent Essay Assessor) E-rater IntelliMetric BETSY (Bayesian Essay Test Scoring sYstem).

PEG Ellis Page, Duke University 1966 uses correlation to predict the intrinsic quality of essays (Chung & O’Neil, 1997). Trins: intrinsic variables such as fluency, grammar, punctuation Proxes : the surface features related to the intrinsic variables, such as word length, part of speech or word meaning (Page & Peterson, 1995).

Essay evaluation process PEG is trained on a sample of more than 300 essays to obtain text features, which is analyzed to establish the correlation to human raters. Proxes are determined for each essay and entered into the prediction equation to get beta weights through regression analysis. A score is assigned to the essay by computing beta weights (coefficients) (Chung & O’Neil, 1997)

Figure 1: PEG system scoring process. Shadowed blocks refer to major sources of variations. Barred blocks indicate results of computations (Cited from Chung & O’Neil, 1997, p. 7 ）

IEA Thomas Landauer and his colleagues, University of Colorado late 1990s Latent Semantic Analysis (LSA; Lemaire & Dessus, 2001).

Essay evaluation process IEA system is trained on domain-representative texts. These texts and the new essay are taken as vectors. The conceptual relevance of the essay is compared to the texts by using LSA. Texts most similar to the essay are selected and weighted by cosine average to obtain a score, which is interpreted as the final score of the essay (Landauer, Laham, & Foltz, 2003).

E-rater Educational Testing Service (ETS) 1990s E-rater uses Natural Language Processing (NLP) techniques, vector-spec model and linear regression model The features of e-rater include a syntactic module, a discourse module, and a topical- analysis module

Essay evaluation process Uses linear regression analysis to process texts scored by human raters Decides the optimal weighting model that can predict the human ratings. Uses NLP to identify some features in an essay and combine them into feature scores. Generates scores by using the weighting model to measure feature scores (Enright & Quinlan, 2010).

Criterion Criterion is an online essay scoring and evaluating system that relies on 1) e-rater to give scores to an essay, and 2) Critique writing analysis tools to provide detailed evaluation and feedback on language, discourse, contents, etc. (Dikli, 2006).

IntelliMetric Vantage Learning 1998 A cognitive model of information processing and understanding Core technology: Artificial intelligence, NLP, computational linguistics, statistics, machine learning, CogniSearch and Quantum Reasoning （ Elliot 2003 ）

Evaluates an essay by features of content and structure (Vantage Learning, 2005) Five categories: focus and unity development and elaboration organization and structure sentence structure mechanics and conventions (Vantage Learning, 2005).

Essay evaluation process Preprocess electronic form of the essay to make sure it is readable by IntelliMetric. Extracts information from the essay by using NLP. Transforms the information into numerical form to support computation of the mathematical models. Applies the mathematical understanding to a new essay and integrates the information to yield the final score (Vantage Learning, 2005).

Figure 4: Architecture of IntelliMetric (cited from Vantage Learning, 2005, p. 12)

MY Access! MY Access! is a web-based writing assessment tool that relies on IntelliMetric to provide students with a writing environment that offers immediate scoring and diagnostic feedback (Vantage Learning, 2005).

BETSY Lawrence M. Rudner of Maryland University in 2002 Two models of Bayesian theorem: the Multivariate Bernoulli Model and the Multinominal Model. Core idea: classification of essays on the basis of about 1000 trained texts (Valenti, et al., 2003). This classification is based on essay features including content related features and form related features.

BETSY use the models to analyze features specific words and phrases frequency of certain content words number of words sentence length number of verbs the order that concepts are presented the occurrence of specific noun verb pairs (Rudner & Liang, 2002). and categorizes new texts into groups: Advanced/Proficient/Basic/Below

Reliability SystemsCorrelation AgreementCitation PEG (Page & Peterson, 1995) IEA 0.85 (Landauer,et al., 2000) E-rater % - 97% （ Burstein, et al., 2004 ） IntelliMetric %-98% (Elliot, 2002) BETSY 80% (Rudner & Liang, 2002).

SEAR （ Christie, 1999 ） APEX （ Lemaire, 2001 ） PS-ME （ Mason & Grove-Stephenson, 2002 ） ATM （ Callear et al., 2001 ） C-rater （ Leacock, 2003 ） eGrader （ Byrne et al., 2010 ） MaxEnt （ Sukkarieh & Bolge, 2010 ）， Writing Roadmap （ Rich et al., 2013 ）， LightSIDE （ Mayfield & Rose, 2013 ） Crase （ Lottridge et al., 2013 ） ……

JUKU Developed by Chinese researchers in 2010, JUKU ( Used by more than 200 universities in China.

Based on corpus and cloud computing technology. Measures the comparative distance between students’ essays and the standard corpora contents. Each essay is measured with 192 dimensions within the categories of vocabulary, sentence, discourse and content. Provides reports involving scores, overall comments and line-by-line feedbacks.

Reliability Reliability analysis based on 1456 essays written by students from Nanjing University. Agreement 92% Complete + adjacent agreement (15 points, 5 levels) 93.37% (Zhang, unpublished) Correlation less than 0.7

Combination of machine and human evaluation Teacher feedback Peer feedback Recommendation Praise Comments

Prospect Writing evaluation, whether by human raters or automated scoring, should satisfy two conditions: 1) the rubric should reflect the essential aspects of writing competence and 2) the ratings should be consistent with the rubric (Weigle, 2013). Cope, et al. (2011) propose “an alternative potential for NLP based on an understanding of the writing process as a fluid, iterative struggle to make meaning” (87).

Figure 5: CBAL writing competency model (Deane, et al., 2011, p. 3)

University of California Give formative feedback on each draft Provide feedback on a wide range of student writing Use LightSIDE to encourage new feature extraction Use machine learning technology to provide an open source for adjusting to new problems Install "machine-student dialogue” and “intelligent tutoring system”

Summary 1) the design of an AWE that help improve learners’ cognitive and critical thinking ability; 2) a shift of emphasis from language and structure of an essay to its ideas, thinking and rhetorical effectiveness; 3) the evaluation of different genres of writing, including both art and scientific articles; 4) the development of new software engine that can provide formative feedback to student writing;

) the use of machine learning technology to design an AWE system that provides open source for adjusting to new problems; 6) the use of machine-human dialogue and intelligent tutoring system to enhance the effect of feedback; 7) the cooperation of different disciplines in the development of AWE writing teachers, test developers, cognitive psychologists, psychometricians, and computer scientists

References Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The criterion online writing service. AI Magazine 25: Chung, K. W. K., & O’Neil, H. F. (1997). Methodological approaches to online scoring of essays. Retrieved from Cope, B., Kalantzis, M., McCarthey, S., Vojak, C., & Kline, S. (2011). Technology-mediated writing assessments: Principles and processes. Computers and Composition 28: 79–96. Deane, P., Quinlan, T., & Kostin, I. (2011). Automated scoring within a developmental, cognitive model of writing proficiency. Princeton, NJ: Educational Testing Service. Elliot, S. (2002). A study of expert scoring, standard human scoring and IntelliMetric scoring accuracy for statewide eighth grade writing responses. Newtown, PA: Vantage Learning. Elliot, S. (2003). IntelliMetric™: From Here to Validity. In M. D. Shermis & J. Burstein (eds.), Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum, Enright, M., & Quinlan, M. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing 27: Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes 25: Landauer, T. K., Laham, D., & Foltz, P. W. (2000). The Intelligent Essay Assessor. IEEE Intelligent systems: The debate on automated essay grading 15: Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment. Assessment in Education 10: Lemaire, B., & Dessus, P. (2001). A system to assess the semantic content of student essays. Educational Computing Research 24: Page, E. B. (1966). The imminence of grading essays by computer. Phi Delta Kappan, 47, Page, E., & Peterson, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan 76: Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes’ theorem. The Journal of Technology, Learning, and Assessment 1(2): Shermis, M., & Barrera, F. (2002). Exit assessments: Evaluating writing ability through Automated Essay Scoring (ERIC document reproduction service no ED ). Shermis, M. D., & Burstein, J. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum, xiii–xvi. Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education 2: Vantage Learning. (2005). How IntelliMetric™ Works. Retrived from Weigle, S. C. (2013). English language learners and automated scoring of essays: Critical considerations. Assessing Writing 18: 85–99. Warschauer, M. (2014). DIP: Next-Generation automated feedback in support of iterative writing and scientific argumentation. Unpublished research proposal.

Thanks