Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotating Students’ Understanding of Science Concepts Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer Center for Computational Language.

Similar presentations


Presentation on theme: "Annotating Students’ Understanding of Science Concepts Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer Center for Computational Language."— Presentation transcript:

1 Annotating Students’ Understanding of Science Concepts Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer Center for Computational Language and Education Research University of Colorado, Boulder

2 LREC May 28, 2008, Rodney D. Nielsen 2 Annotating Fine-Grained Entailments Question: Kate said: “An object has to move to produce sound.” Do you agree with her? Why or why not? Reference answer: Agree. Vibrations are movements and vibrations produce sound. Learner answer: I do not agree because a radio does not move to make sound. The student agreesContradicted Vibrations are movementUnaddressed Vibrations produce somethingDifferent Argument Something produces soundExpressed

3 LREC May 28, 2008, Rodney D. Nielsen 3 Recognizing Textual Entailment Hypothesis: Agree. Vibrations are movements and vibrations produce sound. Text: I do not agree because a radio does not move to make sound. The student agreesFalse Vibrations are movementUnknown Vibrations produce somethingUnknown Something produces soundTrue

4 LREC May 28, 2008, Rodney D. Nielsen 4 Prior Work Automated Tutors Aleven et al. 2001; Graesser et al., 2001; Jordan et al., 2004; Koedinger et al. 1997; Makatchev et al., 2004; Peters et al., 2004; Pon Berry et al., 2004; Roll et al., 2005; Rose et al., 2003; VanLehn et al., 2005 Constructed Response Scoring Callear et al., 2001; Leacock and Chodorow, 2003; Mitchell et al., 2002 & 2003; Pullman, 2005; Sukkarieh, 2003 & 2005 PASCAL RTE (Dagan, Glickman and Magnini, 2005) Differences / Weakness Course grained entailment – yes/no or grade: 0-2 points Question-specific systems Hand-crafted dialog control, parsers, knowledge-based ontologies, logic representations, and or rules Require 100-500 responses per question

5 LREC May 28, 2008, Rodney D. Nielsen 5 Necessity of Finer-Grained Analysis Imagine a tutor only knowing that there is some unspecified part of the reference answer that we are not sure the student understands Reference Answer: A long string produces a low pitch. Break the reference answer down into low-level facets derived from a dependency parse and thematic roles NMod(string, long)The string is long. Agent(produces, string)A string is producing something. Product(produces, pitch)A pitch is being produced. NMod(pitch, low)The pitch is low. Assess whether an understanding of each facet is implicated by the student’s response A long string produces a low pitch. det nmod det nmod object subject

6 LREC May 28, 2008, Rodney D. Nielsen 6 Representing Fine-Grained Semantics Assess the relationship between the student’s answer and the reference answer facets at a finer grain Reference Ans: A long string produces a low pitch. NMod(string, long) Agent(produces, string) Product(produces, pitch) NMod(pitch, low) Expressed Unaddressed A long string produces a pitch. Yes No Assumed Expressed Different Argument It produces a loud pitch. Assumed Expressed Contradiction Expressed It produces a high pitch.

7 LREC May 28, 2008, Rodney D. Nielsen 7 The Focus of This Effort Low level facets of reference answer Finer-grained relationship to the facets

8 LREC May 28, 2008, Rodney D. Nielsen 8 The Corpus GrdLife SciencePhysical Science and Technology Earth and Space Science Scientific Reasoning and Technology 3-4Human Body Structure of Life Magnetism & Electricity Physics of Sound Water Earth Materials Ideas & Inventions Measurement 5-6Food & Nutrition Environments Levers & Pulleys Mixtures & Solutions Solar Energy Landforms Models & Designs Variables Assessing Science Knowledge (ASK): Full Option Science System Berkeley, Lawrence Hall of Science national assessment project (NSF) 16 science teaching and learning modules, Grades 3-6 287 constructed response questions 15,400 total student responses 146,000 facet entailment annotations

9 LREC May 28, 2008, Rodney D. Nielsen 9 Annotation Process Step 1: FOSS/ASK reference answers were manually decomposed into constituent facets Ref Answer: The string is tighter, so the pitch is higher. Be(string, tighter)The string is tighter. Be(pitch, higher) The pitch is higher. Cause(X, Y)X is caused by Y Step 2: Learner answers are annotated to indicate whether and how each facet was addressed Learner Answer: The string is tighter, so there is less tension so the pitch gets higher. Be(string, tighter)The string is tighter.Self-Contra Be(pitch, higher) The pitch is higher.Expressed Cause(X, Y)X is caused by YExpressed

10 LREC May 28, 2008, Rodney D. Nielsen 10 Reference Answer Decomposition The brass ring would not stick to the nail because the ring is not iron. nmod theme_not destination_to_not be_not The brass ring would not stick to the nail because the ring is not iron. vmodnmod vc subvmodpmod vmodsbarprd subvmod cause_because Begin with a manual dependency parse of the reference answer Then raise main verbs, remove unimportant dependencies, incorporate copulas, prepositions and negation into dependency labels, and utilize thematic role labels

11 LREC May 28, 2008, Rodney D. Nielsen 11 Reference Answer Markup Final facets for Ref Answer: The brass ring would not stick to the nail because the ring is not iron. NMod(ring, brass)The ring is brass. Theme_not(stick, ring)The ring does not stick. Destination_to_not(stick, nail)Something does not stick to the nail. Be_not(ring, iron)The ring is not iron. Cause_because(stick, is)X is caused by Y The brass ring would not stick to the nail because the ring is not iron. nmod theme_not destination_to_not be_not cause_because

12 LREC May 28, 2008, Rodney D. Nielsen 12 Answer Annotation Labels Assumed: Facets that are assumed to be understood a priori based on the question Expressed: Any facet directly expressed or inferred by simple reasoning Inferred: Facets inferred by pragmatics or nontrivial logical reasoning Contra-Expr: Facets directly contradicted by negation, antonymous expressions and their paraphrases Contra-Infr: Facets contradicted by pragmatics or complex reasoning Self-Contra: Facets that are both contradicted and implied (self contradictions) Diff-Arg: The core relation is expressed, but it has a different modifier or argument Unaddressed: Facets that are not addressed at all by the student’s answer

13 LREC May 28, 2008, Rodney D. Nielsen 13 Annotation – Expressed & Inferred Question: Kate said: “An object has to move to produce sound.” Do you agree with her? Why or why not? Reference Answer: Agree. Vibrations are movements and vibrations produce sound. Root(root, agree)student agreesExpressed Be(vibration, movement) vibration is movementInferred Agent(produce, vibrations)vibrations produce somethingExpressed Patient(produce, sound)something produces soundExpressed Student Answer: Yes because it has to vibrate to make sounds.

14 LREC May 28, 2008, Rodney D. Nielsen 14 Annotation – Contradictions Question: Darla tied one end of a string around a doorknob and held the other end in her hand. When she plucked the string (pulled and let go quickly) she heard a sound. How would the pitch change if Darla pulled the string tighter? Reference Answer: When the string is tighter, the pitch will be higher. Be(string, tighter)The string is tighter.Assumed Be(pitch, higher) The pitch is higher.Contra-Expr Cause(X, Y)X is caused by YAssumed Student Answer: it will be low the pitch change

15 LREC May 28, 2008, Rodney D. Nielsen 15 Annotation – Unaddressed Question: … Write a note to David to tell him why the pitch gets higher rather than lower Ref Ans: The string is tighter, so the pitch is higher. The string between the cup and table is not longer. … Be_not(string, longer)The string is not longer Unaddressed Student Answer: David pitch is not happening tension is happening okay so calm down.

16 LREC May 28, 2008, Rodney D. Nielsen 16 Labels Assumed: Facets that are assumed to be understood a priori based on the question Expressed: Any facet directly expressed or inferred by simple reasoning Inferred: Facets inferred by pragmatics or nontrivial logical reasoning Contra-Expr: Facets directly contradicted by negation, antonymous expressions and their paraphrases Contra-Infr: Facets contradicted by pragmatics or complex reasoning Self-Contra: Facets that are both contradicted and implied (self contradictions) Diff-Arg: The core relation is expressed, but it has a different modifier or argument Unaddressed: Facets that are not addressed at all by the student’s answer

17 LREC May 28, 2008, Rodney D. Nielsen 17 Inter-annotator Agreement Fine-GrnTutorY/N ITA78.4%86.2%88.0% Kappa0.7040.7280.752 Fine-Grn: all labels kept separate Tutor: combine {Expressed, Inferred & Assumed} and {Contra-Expr & Contra-Infr}, others separate Y/N: combine {Expressed, Inferred & Assumed} v. {everything else} In most disagreements (57%) one annotator chose Unaddressed 49% were between Unaddressed and Understood 35% of disagreements were between the labels implying understanding Only 2.3% of disagreements are between Understood and Contradicted

18 LREC May 28, 2008, Rodney D. Nielsen 18 Assessment Technology Overview Start with hand-generated reference answer facets Automatically parse reference & learner answer and automatically extract representation Generate machine learning feature vectors indicative of the student’s understanding of each facet From answers, their parses, the relations between these, and corpus co-occurrence statistics Train a machine learning classifier on the training set feature vectors Use classifier to assess the test set answers, assigning one of five Tutor-Labels for each RA facet

19 LREC May 28, 2008, Rodney D. Nielsen 19 Results (C4.5 decision tree) Results on Tutor-Labels are: 24.4, 8.1 and 15.4% over most frequent class baseline 19.4, 3.1 and 5.9% over lexical baseline (All Unseen Modules facets adjudicated, about half of other modules adjudicated) # nonAsmd Facets Majority Class Lexical Baseline All Features Reduced Training Training Set 10xCV54,96754.659.777.1 Unseen Answers30,51451.156.175.5 Unseen Questions6,69958.463.461.766.5 Unseen Modules3,15953.462.961.468.8

20 LREC May 28, 2008, Rodney D. Nielsen 20 Conclusions New assessment paradigm to enable more effective tutoring dialog management Facet break down: enables the tutor to provide feedback relevant specifically to the appropriate part of the reference answer Additional labels: facilitate understanding the type of mismatch between the reference answer/hypothesis and the student’s answer/text

21 LREC May 28, 2008, Rodney D. Nielsen 21 Conclusions Corpus of annotated answers Substantial agreement: 86.2% on Tutor- Labels, 0.728 Kappa About 146K facet annotations Only corpus of fine-grained inference information Freely available Will support alternative approaches to the Recognizing Textual Entailment task

22 LREC May 28, 2008, Rodney D. Nielsen 22 Conclusions Answer Assessment System Evaluation according to new paradigm Within domain performance: 24% over majority class baseline Out-of-domain performance: 15% over majority class baseline First system to address out-of-domain assessment First successful assessment of Grade 3-6 constructed responses

23 LREC May 28, 2008, Rodney D. Nielsen 23 Thanks ! This work was partially funded by Award Numbers: NSF 0551723, IES R305B070434, and NSF DRL-0733323.


Download ppt "Annotating Students’ Understanding of Science Concepts Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer Center for Computational Language."

Similar presentations


Ads by Google