Presentation is loading. Please wait.

Presentation is loading. Please wait.

1896192019872006 Automated Writing Evaluation(AWE): Past, Present and Prospect Dr. Li Zhang ( 张荔) Shanghai Jiao Tong University Shanghai, China.

Similar presentations


Presentation on theme: "1896192019872006 Automated Writing Evaluation(AWE): Past, Present and Prospect Dr. Li Zhang ( 张荔) Shanghai Jiao Tong University Shanghai, China."— Presentation transcript:

1 1896192019872006 Automated Writing Evaluation(AWE): Past, Present and Prospect Dr. Li Zhang ( 张荔) Shanghai Jiao Tong University Shanghai, China

2 1896192019872006 Outline Introduction of kinds of AWE. Introduction of JUKU - AWE developed in China. Prediction of future development of AWE

3 1896192019872006 Some most widely used AWE systems PEG (Project Essay Grader) IEA (Intelligent Essay Assessor) E-rater IntelliMetric BETSY (Bayesian Essay Test Scoring sYstem).

4 1896192019872006 PEG Ellis Page, Duke University 1966 uses correlation to predict the intrinsic quality of essays (Chung & O’Neil, 1997). Trins: intrinsic variables such as fluency, grammar, punctuation Proxes : the surface features related to the intrinsic variables, such as word length, part of speech or word meaning (Page & Peterson, 1995).

5 1896192019872006 Essay evaluation process PEG is trained on a sample of more than 300 essays to obtain text features, which is analyzed to establish the correlation to human raters. Proxes are determined for each essay and entered into the prediction equation to get beta weights through regression analysis. A score is assigned to the essay by computing beta weights (coefficients) (Chung & O’Neil, 1997)

6 1896192019872006 Figure 1: PEG system scoring process. Shadowed blocks refer to major sources of variations. Barred blocks indicate results of computations (Cited from Chung & O’Neil, 1997, p. 7 )

7 1896192019872006 IEA Thomas Landauer and his colleagues, University of Colorado late 1990s Latent Semantic Analysis (LSA; Lemaire & Dessus, 2001).

8 1896192019872006 Essay evaluation process IEA system is trained on domain-representative texts. These texts and the new essay are taken as vectors. The conceptual relevance of the essay is compared to the texts by using LSA. Texts most similar to the essay are selected and weighted by cosine average to obtain a score, which is interpreted as the final score of the essay (Landauer, Laham, & Foltz, 2003).

9 1896192019872006

10 1896192019872006 E-rater Educational Testing Service (ETS) 1990s E-rater uses Natural Language Processing (NLP) techniques, vector-spec model and linear regression model The features of e-rater include a syntactic module, a discourse module, and a topical- analysis module

11 1896192019872006 Essay evaluation process Uses linear regression analysis to process texts scored by human raters Decides the optimal weighting model that can predict the human ratings. Uses NLP to identify some features in an essay and combine them into feature scores. Generates scores by using the weighting model to measure feature scores (Enright & Quinlan, 2010).

12 1896192019872006

13 1896192019872006 Criterion Criterion is an online essay scoring and evaluating system that relies on 1) e-rater to give scores to an essay, and 2) Critique writing analysis tools to provide detailed evaluation and feedback on language, discourse, contents, etc. (Dikli, 2006).

14 1896192019872006 IntelliMetric Vantage Learning 1998 A cognitive model of information processing and understanding Core technology: Artificial intelligence, NLP, computational linguistics, statistics, machine learning, CogniSearch and Quantum Reasoning ( Elliot 2003 )

15 1896192019872006 Evaluates an essay by features of content and structure (Vantage Learning, 2005) Five categories: focus and unity development and elaboration organization and structure sentence structure mechanics and conventions (Vantage Learning, 2005).

16 1896192019872006 Essay evaluation process Preprocess electronic form of the essay to make sure it is readable by IntelliMetric. Extracts information from the essay by using NLP. Transforms the information into numerical form to support computation of the mathematical models. Applies the mathematical understanding to a new essay and integrates the information to yield the final score (Vantage Learning, 2005).

17 1896192019872006 Figure 4: Architecture of IntelliMetric (cited from Vantage Learning, 2005, p. 12)

18 1896192019872006 MY Access! MY Access! is a web-based writing assessment tool that relies on IntelliMetric to provide students with a writing environment that offers immediate scoring and diagnostic feedback (Vantage Learning, 2005).

19 1896192019872006 BETSY Lawrence M. Rudner of Maryland University in 2002 Two models of Bayesian theorem: the Multivariate Bernoulli Model and the Multinominal Model. Core idea: classification of essays on the basis of about 1000 trained texts (Valenti, et al., 2003). This classification is based on essay features including content related features and form related features.

20 1896192019872006 BETSY use the models to analyze features specific words and phrases frequency of certain content words number of words sentence length number of verbs the order that concepts are presented the occurrence of specific noun verb pairs (Rudner & Liang, 2002). and categorizes new texts into groups: Advanced/Proficient/Basic/Below

21 1896192019872006 Reliability SystemsCorrelation AgreementCitation PEG0.72- 0.78(Page & Peterson, 1995) IEA 0.85 (Landauer,et al., 2000) E-rater 0.73-0.9387% - 97% ( Burstein, et al., 2004 ) IntelliMetric 0.83 94%-98% (Elliot, 2002) BETSY 80% (Rudner & Liang, 2002).

22 1896192019872006 SEAR ( Christie, 1999 ) APEX ( Lemaire, 2001 ) PS-ME ( Mason & Grove-Stephenson, 2002 ) ATM ( Callear et al., 2001 ) C-rater ( Leacock, 2003 ) eGrader ( Byrne et al., 2010 ) MaxEnt ( Sukkarieh & Bolge, 2010 ), Writing Roadmap ( Rich et al., 2013 ), LightSIDE ( Mayfield & Rose, 2013 ) Crase ( Lottridge et al., 2013 ) ……

23 1896192019872006 JUKU Developed by Chinese researchers in 2010, JUKU (http://www.pigai.org/) Used by more than 200 universities in China.

24 1896192019872006 Based on corpus and cloud computing technology. Measures the comparative distance between students’ essays and the standard corpora contents. Each essay is measured with 192 dimensions within the categories of vocabulary, sentence, discourse and content. Provides reports involving scores, overall comments and line-by-line feedbacks.

25 1896192019872006 Reliability Reliability analysis based on 1456 essays written by students from Nanjing University. Agreement 92% Complete + adjacent agreement (15 points, 5 levels) 93.37% (Zhang, unpublished) Correlation less than 0.7

26 1896192019872006

27 1896192019872006

28 1896192019872006

29 1896192019872006

30 1896192019872006 Combination of machine and human evaluation Teacher feedback Peer feedback Recommendation Praise Comments

31 1896192019872006 Prospect Writing evaluation, whether by human raters or automated scoring, should satisfy two conditions: 1) the rubric should reflect the essential aspects of writing competence and 2) the ratings should be consistent with the rubric (Weigle, 2013). Cope, et al. (2011) propose “an alternative potential for NLP based on an understanding of the writing process as a fluid, iterative struggle to make meaning” (87).

32 1896192019872006 Figure 5: CBAL writing competency model (Deane, et al., 2011, p. 3)

33 1896192019872006 University of California Give formative feedback on each draft Provide feedback on a wide range of student writing Use LightSIDE to encourage new feature extraction Use machine learning technology to provide an open source for adjusting to new problems Install "machine-student dialogue” and “intelligent tutoring system”

34 1896192019872006 Summary 1) the design of an AWE that help improve learners’ cognitive and critical thinking ability; 2) a shift of emphasis from language and structure of an essay to its ideas, thinking and rhetorical effectiveness; 3) the evaluation of different genres of writing, including both art and scientific articles; 4) the development of new software engine that can provide formative feedback to student writing;

35 1896192019872006 5) the use of machine learning technology to design an AWE system that provides open source for adjusting to new problems; 6) the use of machine-human dialogue and intelligent tutoring system to enhance the effect of feedback; 7) the cooperation of different disciplines in the development of AWE writing teachers, test developers, cognitive psychologists, psychometricians, and computer scientists

36 1896192019872006 References Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The criterion online writing service. AI Magazine 25: 27-35. Chung, K. W. K., & O’Neil, H. F. (1997). Methodological approaches to online scoring of essays. Retrieved from http://www.cse.ucla.edu/products/reports/tech461.pdf Cope, B., Kalantzis, M., McCarthey, S., Vojak, C., & Kline, S. (2011). Technology-mediated writing assessments: Principles and processes. Computers and Composition 28: 79–96. Deane, P., Quinlan, T., & Kostin, I. (2011). Automated scoring within a developmental, cognitive model of writing proficiency. Princeton, NJ: Educational Testing Service. Elliot, S. (2002). A study of expert scoring, standard human scoring and IntelliMetric scoring accuracy for statewide eighth grade writing responses. Newtown, PA: Vantage Learning. Elliot, S. (2003). IntelliMetric™: From Here to Validity. In M. D. Shermis & J. Burstein (eds.), Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum, 71-86. Enright, M., & Quinlan, M. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing 27: 317-334. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes 25: 259-284. Landauer, T. K., Laham, D., & Foltz, P. W. (2000). The Intelligent Essay Assessor. IEEE Intelligent systems: The debate on automated essay grading 15: 27- 31. Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment. Assessment in Education 10: 295-308. Lemaire, B., & Dessus, P. (2001). A system to assess the semantic content of student essays. Educational Computing Research 24: 305-306. Page, E. B. (1966). The imminence of grading essays by computer. Phi Delta Kappan, 47, 238-243. Page, E., & Peterson, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan 76: 561 - 565. Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes’ theorem. The Journal of Technology, Learning, and Assessment 1(2): 3-21. Shermis, M., & Barrera, F. (2002). Exit assessments: Evaluating writing ability through Automated Essay Scoring (ERIC document reproduction service no ED 464 950). Shermis, M. D., & Burstein, J. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum, xiii–xvi. Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education 2: 319-330. Vantage Learning. (2005). How IntelliMetric™ Works. Retrived from http://www.cengagesites.com/academic/assets/sites/4994/WE_2_IM_How_IntelliMetric_Works.pdf Weigle, S. C. (2013). English language learners and automated scoring of essays: Critical considerations. Assessing Writing 18: 85–99. Warschauer, M. (2014). DIP: Next-Generation automated feedback in support of iterative writing and scientific argumentation. Unpublished research proposal.

37 Thanks zhangli@sjtu.edu.cn


Download ppt "1896192019872006 Automated Writing Evaluation(AWE): Past, Present and Prospect Dr. Li Zhang ( 张荔) Shanghai Jiao Tong University Shanghai, China."

Similar presentations


Ads by Google