Third Recognizing Textual Entailment Challenge Potential SNeRG Submission.

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

Guide to data collection A look at why now… and why us.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistics and Quantitative Analysis U4320
Mean, Proportion, CLT Bootstrap
Distant Supervision for Emotion Classification in Twitter posts 1/17.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.
Programming in Visual Basic
Chapter 14 Comparing two groups Dr Richard Bußmann.
Performance in Groups. Outline  Types of tasks Additive Compensatory Disjunctive Conjunctive Discretionary.
Recognizing Textual Entailment Progress towards RTE 4 Scott Settembre University at Buffalo, SNePS Research Group
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Python Programming Chapter 1: The way of the program Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
Textual Entailment Using Univariate Density Model and Maximizing Discriminant Function “Third Recognizing Textual Entailment Challenge 2007 Submission”
Welcome to Turnitin.com’s Peer Review! This tour will take you through the basics of Turnitin.com’s Peer Review. The goal of this tour is to give you.
INFO 624 Week 3 Retrieval System Evaluation
Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.
Monitoring and Pollutant Load Estimation. Load = the mass or weight of pollutant that passes a cross-section of the river in a specific amount of time.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Conditional and Cross-Sheet Formulas William Klingelsmith.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
ITS-VIP SPRING 2012 FINAL PRESENTATION DATA MINING GROUP PHP?HTML INTERFACE Mide Ajayi Nakul Dureja Data Miners Rakesh Kumar David Fleischhauer.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Summary of what we learned yesterday Basics of C++ Format of a program Syntax of literals, keywords, symbols, variables Simple data types and arithmetic.
A Language Independent Method for Question Classification COLING 2004.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Multi-Criteria Analysis - preference weighting. Defining weights for criteria Purpose: to express the importance of each criterion relative to other criteria.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Chapter 2: Getting to Know Your Data
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
TDTIMS Overview What is TDTIMS? & Why Do We Do It?
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Previous version of this project: System had a set of expert agents predefined The blackboard was updated incrementally and heuristically. The evidence.
Discovering Mathematics Week 3 Unit 2: Mathematical Models Unit 3: Numbers.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Jin Huang M.I.T. For Transversity Collaboration Meeting Jan 29, JLab.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Text Based Information Retrieval
Information Retrieval and Web Search
Recognizing Partial Textual Entailment
CS 430: Information Discovery
What Is Good Clustering?
False discovery rate estimation
Information Retrieval and Web Design
Regression lecture 2 1. Review: deterministic and random components
Presentation transcript:

Third Recognizing Textual Entailment Challenge Potential SNeRG Submission

RTE3 Quick Notes RTE Web Site: Textual Entailment resource pool: New development set released to correct errors last week Test set released on March 5 th !!! New !!! submission date March 12 th Report deadline date March 26th

Development set examples Example of a YES result A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured. 30 die in a bus collision in Uganda. Example of a NO result Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One. Blue Mountain Lumber owns Ernslaw One.

Development set examples – cont. 4 Different types of entailment tasks –Information Retrieval (IR) –Question Answering (QA) –Information Extraction (IE) –Multi-document summarization (SUM) Development set consists of 200 samples of each type of entailment 400 evaluate to “YES” and 400 to “NO” Another attribute “length” in the development set has only 134 long and 666 short. [Note to self: gather a group of demon hunters to hunt down the short samples, will need volunteers and holy water.]

Evaluation Two submissions per team can be made Program output is a file that contains the following information. Line 1 must contain: “ ranked: yes/no” Line 2..end contains: “ pair_id judgment “ For example: ranked: yes 4 YES 3 YES 6 YES 1 NO 5 NO 2 NO Accuracy is calculated from the answers returned correct Precision is determined by the order and the correctness of the answers returned by the formula: 1/R * sum for i=1 to n (E(i) *#-correct-up-to-pair-i/i) n is the number of the pairs in the test set R is the total number of positive pairs in the test set E(i) is 1 if the i-th pair is positive and 0 otherwise and i ranges over the pairs, ordered by their ranking.

Possible Implementation Discover features that can be measured with a continuous variable For example: –Wordbag match ratio = # of words matched between text and hypothesis / # of words in the hypothesis Arrange feature values in a feature vector x Apply the general multivariate normal density for the assembled feature vector x

Implementation to Determine Baseline I have done an implementation to determine the baseline of what we can expect out of a full implementation of all syntactic features First baseline result: Used 1 feature: Wordbag count > n, where n is decided after development set is processed Success: 509, Fail: 290 Final rate: 63.9% Second baseline result: Used simple preprocessing and Wordbag count: removing punctuation, case insensitivity, ignoring simple words Success: 534, Fail: 265 Final rate: 66.8% Attempted a little semantic processing, like increasing weight based on “negative” words for returning negative results, but results did not increase In RTE2 competition highest accuracy was only 70%!

Potential Features Wordbag ratio = # of matches between text and hypothesis / # of words in hypothesis Works for: A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured. 30 die in a bus collision in Uganda. Wordbag ratio = 6 / 8 Fails for: Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One. Blue Mountain Lumber owns Ernslaw One. Wordbag ratio = 5 / 6 Potential solution needs to include processing semantic knowledge about the relationship between the highlighted red words.

Potential Features – cont. Word proximity = average distance between matched words in the text For example: A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured. 30 die in a bus collision in Uganda. Matched words: 30 in bus collision in Uganda 30: 3, 12, 11, 3, 6 in: 3, 5, 4, 1 bus: 12, 5, 1, 5, 6 collision: etc… May not help much or at all, but by adding additional independent features (from a gaussian distribution), we can potentially increase the P(w n |x)

Potential Features – cont. Word grouping = average of counts of word groups of length 2 / possible combos For example: A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured. 30 die in a bus collision in Uganda. Matched groups: “bus collision”, “in Uganda”, 7 possible combinations = 2/7 Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One. Blue Mountain Lumber owns Ernslaw One. Matched groups: “Blue Mountain”, “Mountain Lumber”, “Ernslaw One”, 5 combinations = 3/5 Once again this may not help much or at all, but may help us brainstorm a bit

Potential Features – cont. Quick and easy stats we can generate may include using –Stemmers – count matching verbs? –Synonyms/Antonyms – count any matches for both types –Parts of speech - brainstorm anyone? –Removal or weighting of names, place-names – make a multiple word “match” into a single symbol so as not to give extra weight to names or place-names –Matching phrases that appear similar in both the text and the hypothesis Any “count” that can be created from any processing of semantic or syntactic information would be able to be used I am now using Matlab to implement, so any Unix program can be used to process a feature – maybe there is some existing feature extraction Unix command-line program that someone knows about

RTE3 Important Dates Test set released on March 5 th –Gives us 10 days before we can submit Last day to submit is March 12 th –Submission consists of running the data yourself and then submitting the result file –A cheater says whaaaa? Technical report deadline March 26 th I will be working on this on and off until March 6 th, then I can devote full time for our submission