Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.

Similar presentations


Presentation on theme: "Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing."— Presentation transcript:

1 Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing & Evaluation Tbilisi, Georgia, September, 2007

2 2 Merits of AES  Psychometric  Objectivity & standardization  Logistic  Saves time & money  Allows for immediate reporting of scores  Didactic  Immediate diagnostic feedback

3 3 AES - How does it work?  Humans rate sample of essays  Computer extracts relevant text features  Computer generates model to predict human scores  Computer applies prediction model to score new essays

4 4 AES – Model Determination Feature determination  Text driven – empirically based quantitative (computational) variables  Theoretically driven Weight determination  Empirically based  Theoretically based

5 5 Scoring Dimensions ContentRhetorical Structure StyleSyntax & Grammar Vocabulary RelevanceOrganizationClarityComplexityRichness Richness of ideas CoherenceFluencySyntactical accuracy Register OriginalityCohesionAccuracyGrammatic al accuracy Spelling ParagraphingAccuracy Focus

6 6 AES - Examples of Text Features Surface variables  Essay length  Av. word / sentence length  Variability of sentence length  Av. word frequency  Word similarity to prototype essays  Style errors (e.g., repetitious words, very long sentences) NLP based variables  The number of “ discourse ” elements  Word complexity (e.g., ratio of different content words to total no. of words)  Style errors (e.g., passive sentences)

7 7 AES: Commercially Available Systems  Project Essay Grade (PEG)  Intelligent Essay Assessor (IEA)  Intellimetric  e-rater

8 8 PEG (Project Essay Grade) Scoring Method  Uses NLP tools (grammar checkers, part- of-speech taggers) as well as surface variables  Typical scoring model uses 30-40 features  Features are combined to produce a scoring model through multiple regression Score Dimensions  Content, Organization, Style, Mechanics, Creativity

9 9 Intelligent Essay Assessor Scoring Method  Focuses primarily on the evaluation of content  Based on Latent Semantic Analysis (LSA)  Based on a well-articulated theory of knowledge acquisition and representation  Features combined through hierarchical multiple regression Score Dimensions  Content, Style, Mechanics

10 10 Intellimetric Scoring Method  “ Brain-based ” or “ mind-based ” model of information processing and understanding  Appears to draw more on artificial intelligence, neural net, and computational linguistic traditions than on theoretical models of writing  Uses close to 500 features Score Dimensions  Content, Creativity, Style, Mechanics, Organization

11 11 E-rater v2 Scoring Method  Based on natural language processing and statistical methods  Uses a fixed set of 12 features that reflect good writing  Features are combined using hierarchical multiple regression Score Dimensions  Grammar, usage, mechanics, and style  Organization and development  Topical analysis (content)  Word complexity  Essay length

12 12 Writing Dimensions and Features in e-rater v2 (2004) FeatureDimension 1.Ratio of grammar errors 2.Ratio of mechanics errors 3.Ratio of usage errors 4.Ratio of style errors Grammar, usage, mechanics, & style 5.The number of “ discourse ” units detected in the essay (i.e., background, thesis, main ideas, supporting ideas) 6.The average length of each element in words Organization & development 7.Similarity of the essay ’ s content to other previously scored essays in the top score category 8.The score category containing essays whose words are most similar to the target essay Topical analysis 9.Word repetition (ratio of different content words) 10.Vocabulary difficulty (based on word frequency) 11.Average word length Word complexity 12.Total number of words Essay length

13 13 Reliability Studies Reliability Studies Studies comparing inter-rater agreement to computer-rater agreement Human- Computer r Human- Human r Sample size TestAuthorSystem.74-.75 (1-r).75497 GRE (36-ps) Petersen & Page, 1997 PEG.83 (6-rs).71 386 English placement test (1-p) Shermis et al., 2002 PEG.82 (1-r).85 (2-rs).84 102 K-12 norm- referenced test Elliot, 2001Intelli Metric.80.83 188 GMATLandauer et al., 1997 IEA.86.86-.87 1,363 GMATFoltz et al., 1999 IEA.79-.87 (1-r).82-.89500-1,000 GMAT (13-ps) Burstein et al., 1998 e-rater

14 14 AES: Validity Issues  To what extent are the text features used by AES programs valid measures of writing skills?  To what extent is the AES inappropriately sensitive to irrelevant features and insensitive to relevant ones?  Are human grades an optimal criterion?  Which external criteria should be used for validation?  What are the wash-back effects (consequential validity)?

15 15 Weighting Human & computer Scores  Automated scoring used only as a quality control (QC) check  Automated scoring and human scoring  Human scoring used only as a QC check

16 16 AES: To use or not to use?  Are the essays written by hand or composed on computer?  Is there enough volume to make AES cost-effective?  Will students, teachers, and other key constituencies accept automated scoring?

17 17 Criticism and Reservations  Insensitive to some important features relevant to good writing  Fail to identify and appreciate unique writing styles and creativity  Susceptible to construct-irrelevant variance  May encourage writing for the computer as opposed to writing for people

18 18 How to choose a program? 1.Does the system work in a way you can defend? 2.Is there a credible research base supporting the use of the system for your particular purpose? 3.What are the practical implications of using the system? 4.How will the use of the system affect students, teachers, and other key constituencies?

19 19 Thank You


Download ppt "Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing."

Similar presentations


Ads by Google