What is the Entrance Exams Task

Slides:

Advertisements

Similar presentations

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.

Advertisements

Summer Internship Program Outline

Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.

What is the difference between syntax and semantics.

1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

College Entrance Exams An overview of the SAT I, SAT II, and ACT.

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.

Common Core State Standards Professional Learning Module Series

Preparing for the Verbal Reasoning Measure. Overview Introduction to the Verbal Reasoning Measure Question Types and Strategies for Answering General.

ELA and Mathematics 3 rd Grade New York State Exam Workshop.

English Language Development Assessment (ELDA) Background to ELDA for Test Coordinator and Administrator Training Mike Fast, AIR CCSSO/LEP-SCASS March.

Who: 11 th Graders What: College Entrance Test When: March 19 th Where: Wando High School.

College Admissions Testing: What You Need to Know.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

The present publication was developed under grant X from the U.S. Department of Education, Office of Special Education Programs. The views.

Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,

Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,

Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.

Multiple Choice. Advice Experiment with different approaches to the passage and the questions. Practice enough so that each student will find the best.

© 2015 The College Board The Redesigned SAT/PSAT Key Changes.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

An ACT Overview. The ACT and the SAT are both meant to test your knowledge of the fundamentals of a high school education in the United States. Differences.

INFO KLEDO HEALTH Nurul Maretia Rahmayanti Knowledge Center Division Kledo Health BY GRE and GMAT Which one should we take?

1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.

Textual Analysis Introduction. What is Textual Analysis? Textual Analysis, as the name suggests, involves the Analysis of a literary Text. It is very.

3 rd, 4 th and 5 th Grade. Agenda Preparation looks like… General Information for Exams Overview of ELA Exam Overview of Math Exam Ways to Help At Home.

Recognising Textual Entailment Johan Bos School of Informatics University of Edinburgh Scotland,UK.

SCIENCE TEST 35 Minutes; 40 Questions; 7 Passages 5 – 7 questions per passage 5 minutes per passage Evaluates your ability to reason scientifically 3 Question.

SYNAPSE DÉVELOPPEMENT AT ENTRANCE EXAMS /09/2015.

11 Thoughts on STS regarding Machine Reading Ralph Weischedel 12 March 2012.

GRE® Graduate Record Examination®

ACT English Test Preparation

Corpus-Based Study of Japanese University English Entrance Exams

IB Assessments CRITERION!!!.

The Move to Global War: 1931 – 1940

Making Connections: guidance on non-exam assessment

JAMB USE OF ENGLISH CORE SUBJECT SERIES.

Improving a Pipeline Architecture for Shallow Discourse Parsing

PS Parent Workshop ELA Testing Workshop

Lietta Scott, PhD Arizona Department of Education

Grade 4 ELA Short-response (2-point) Rubric/Constructed-response

G CISA Dumps PDF Certified Information Systems Auditor CISA DumpsCISA Braindumps CISA Exam Dumps.

Preparing for the Verbal Reasoning Measure

College entrance exams

English Language Assessment Objectives

Michigan Reading Standards

Your Standards TODAY’S FLOW MORNING: Standards & 1st Unit Curriculum

Picking the Right Exam.

Knowing the key points and supporting them

The Mid Tudors A2 Evaluation and enquiry questions

The Mid Tudors AS Evaluation and enquiry questions

EPAS Educational Planning and Assessment System By: Cindy Beals

CS246: Information Retrieval

UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

The SAT, ACT, and SAT Subject Tests

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Topic: Semantic Text Mining

Extracting Why Text Segment from Web Based on Grammar-gram

SUU Presents: ACT Prep.

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

What is the Entrance Exams Task

Outline Task description Data Evaluation Some Systems Conclusion References

Task description A new task for evaluating machine reading systems by solving problems from Janpanse university entrance exams. The challenge of this task aims at evaluating systems under the same conditions humans are evaluated to enter the University. [Note that: Reading Comprehension tests are used to assess the degree to which people comprehend what they read , so there is a hypothesis that it is reasonable to use this tests to assess the degree to which a machine "comprehends" what it is reading.]

Task description Participants are asked to read a given document and answer questions. Questions are given in multiple-choice format, with several options from which a single answer must be selected. Note that: [1]. Background text collections are not provided, Systems have to answer questions by referring to "common sense knowledge" that high school students who aim to enter the university are expected to have. [2]. Do not restrict question type. Any types of reading comprehesion question in real entrance exams will be included in the test data.

Data The data set is extracted from standardized English examinations for university admission in Japan. Exams are created by the Japanese National Center for University Admissions Tests. Original examinations include various style of questions ,such as word filling, grammatical error recognition, sentence filling, etc. However it focus on reading comprehension. Language : English (original), Russian, French, Spanish , Italian and German.

Evaluation The task evaluates participating systems by given them a score between 0 and 1 using c@1. Systems might obtain higher scores if they leave questions unanswered when they may possibly be wrong. Systems received evaluation scores from two different perspectives: [1]. At the question-answering level: correct answers are counted individually without grouping them. [2]. At the reading-test level: figures both for each reading test as a whole are given.

Some Systems Task Registered groups Submitted Runs Participant groups EE2013 10[27 interest] 5 10 runs EE2014 20 29 runs EE2013 10 runs: 3 runs(>random); 7 runs(<=random). EE2014 29 runs: 14 runs(>random); 15 runs(<=random). [1 run(>0.5)]

Some Systems Systems Key points and procedures Result JUCS[2013] 0.42 Text-Hypothesis(T-H), Textual Entailment, Answer Ranking [Retrieve relevant sentences(T);Assigning a ranking score to Each T-H pair;Answer Ranking ] 0.42 NIIJ[2013] Text similarity,Textual Entailment [Character Resolver;Retrieve related sentences,T-H; Calculate the T-H confience scores from Textual Entailment;] 0.35 DIPF[2014] Coreference resolution, text similarity, Textual entailment [Retrieve sentences; Textual Entailment on the Text-Hypothesis; Answer Similarity; Answer Selection]. 0.375

Some Systems Systems Key points and procedures Result Synapse[2014] Deep syntactic and semantic analysis, Clause Description Structures(CDS); [Remove some candidate answers; Compute their proximity between documents and candidate answers using CDSs;Rank and select] 0.59[French] 0.45[English] CICNLP[2014] Graph Generation,Cosine Simliarity [Build hypotheses (question with candidate answers); graph representations, linguistic features; cosine simliarity,and ranking] 0.375 LIMSI-CNRS [2014] Semantic relatedness, Alignment, Validation (rules) [Retrieves passages; Alignment of answer PAS (predicate-argument structure) and passage PAS; Validation/Invalidation] 0.25 CSGS[2014] Semantic similarity ;Alignment model [Sentence Selection; Answer Selection] 0.362

Conclusion The level of textual inferences that current systems perform is not enough to solve the majority of questions. current systems systems based only on textual similiarity can't address the challenge.(sometimes 越像越不对 )[1]. In order to obtain more than 2/3 of good answers, pragmatic knowledge and inference are essential.[3] The Entrance Exams task shows that Question answering is a task far from being solved.[2]

References [1]. Anselmo Peñas, Yusuke Miyao, Eduard Hovy, Pamela Forner and Noriko Kando. Overview of QA4MRE 2013 Entrance Exams Task. In: CLEF(Online Working Notes/Labs/Workshop).(2013) [2]. Anselmo Peñas, Yusuke Miyao, Alvaro Rodrigo, Eduard Hovy, Noriko Kando. Overview of CLEF QA Entrance Exams Task 2014. CLEF2014 Working Notes. (2014) [3]. Dominique Laurent, Baptiste Chardon, Sophie Negre and Patrick Seguela. English run of Synapse Développement at Entrance Exams 2014. CLEF 2014 Working Notes.(2014)

Thank you