We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byGraham Voshall
Modified over 7 years ago
© 2012 The MITRE Corporation. All rights reserved. For Internal MITRE Use Evaluation and STS Workshop on a Pipeline for Semantic Text Similarity (STS) March 12, 2012 Sherri Condon The MITRE Corporation
© 2012 The MITRE Corporation. All rights reserved. For Internal MITRE Use Page 2 Evaluation and STS Wish-list ■Valid: measure what we think we’re measuring (definition) ■Replicable: same results for same inputs (annotator agreement) ■Objective: no confounding biases (from language or annotator) ■Diagnostic –Not all evaluations achieve this –Understanding factors and relations ■Generalizable –Makes true predictions about new cases –Functional: if not perfect, good enough ■Understandable: –Meaningful to stakeholders –Interpretable components ■Cost effective
© 2012 The MITRE Corporation. All rights reserved. For Internal MITRE Use ■Meaning as extension: same/similar denotation –Anaphora/coreference and time/date resolution –The evening star happens to be the morning star –“Real world” knowledge = true in this world ■Meaning as intension –Truth (extension) in the same/similar possible worlds –Compositionality: inference and entailment ■Meaning as use –Equivalence for all the same purposes in all the same contexts –“Committee on Foreign Affairs, Human Rights, Common Security and Defence Policy” vs “Committee on Foreign Affairs” –Salience, application specificity, implicature, register, metaphor ■Yet remarkable agreement in intuitions about meaning Quick Foray into Philosophy Page 3
© 2012 The MITRE Corporation. All rights reserved. For Internal MITRE Use ■Computer vision through a human lens –Recognize events in video as verbs –Produce text descriptions of events in video ■Comparing human descriptions to system descriptions raises all the STS issues –Salience/importance/focus (A gave B a package. They were standing.) –Granularity of description (car vs. red car, woman vs. person) –Knowledge and inference (standing with motorcycle vs. sitting on motorcycle: motorcycle is stopped) –Unequal text lengths ■Demonstrates value of/need for understanding these factors DARPA Mind’s Eye Evaluation Page 4
© 2012 The MITRE Corporation. All rights reserved. For Internal MITRE Use ■Basic similarity scores based on dependency parses –Scores increase for matching predicates and arguments –Scores decrease for non-matching predicates and arguments –Accessible syn-sets and ontological relations expand matches ■Salience/importance –Obtain many “reference” descriptions –Weight predicates and arguments based on frequency ■Granularity of description –Demonstrates influence of application context –Program focus is on verbs, so nouns match loosely ■Regularities in evaluation efforts that depend on semantic similarity promise common solutions (This is work with Evelyne Tzoukermann and Dan Parvaz) Mind’s Eye Text Similarity Page 5
© 2012 The MITRE Corporation. All rights reserved. For Internal MITRE Use Predicate ArgumentWeight123456 dobjclosecan1x dobjcloseit3 dobjholdlid6x dobjplaceit1 dobjplacelid2 dobjputit1xxx dobjputlid5x dobjtakelid1 nsubjclosewoman1 nsubjholdwoman2x Nominal errors0 33 5 Verbal Errors01116 Predicate errors1.5053239 Total Score15-6.52.51-4-14 Test Sentences Page 6
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Semantics (Chapter 17) Muhammed Al-Mulhem March 1, 2009.
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
The Five Main Components of Reading Instruction
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Why study grammar? Knowledge of grammar facilitates language learning
Presupposition General definition: entailment under negation. I don’t regret saying it. I regret saying it. A topic of much interest in philosophy: the.
Dr E. Lugo Morales1 * the ability to understand information presented in written form. * understanding textbook assignments. * one's interpretation of.
Prefix Definition: in front, against, in the way of.
“You cannot teach anybody anything. You can only help them discover it within themselves.” ~~ Galileo.
CS305: HCI in SW Development Evaluation (Return to…)
Words Words Words! Helping ELL Students Develop Vocabulary.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Chapter 4 Validity.
MITRE © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. What Works, What Doesn’t -- And What Needs to Work Lynette Hirschman Information Technology Center.
Testing 05 Test Methods. Considerations in Test Methods Like traits tested, test methods also affect test performance. Test methods: features of the test.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Testing Bridge Lengths The Gadsden Group. Goals and Objectives Collect and express data in the form of tables and graphs Look for patterns to make predictions.
Meaning and Language Part 1.
Ambiguity, Generality, and Definitions
© 2022 SlidePlayer.com Inc. All rights reserved.