Textual Entailment Using Univariate Density Model and Maximizing Discriminant Function “Third Recognizing Textual Entailment Challenge 2007 Submission”

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.

Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.

Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

ImageCLEF breakout session Please help us to prepare ImageCLEF2010.

Biomedical Presentation Name: 牟汝振 Teach Professor: 蔡章仁.

Semantic Entailment Nathaniel Story Ginger Buckbee Greg Lorge Billy Dean.

Recognizing Textual Entailment Progress towards RTE 4 Scott Settembre University at Buffalo, SNePS Research Group

Evaluating Search Engine

Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

Ensemble Learning: An Introduction

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.

UNED at PASCAL RTE-2 Challenge IR&NLP Group at UNED nlp.uned.es Jesús Herrera Anselmo Peñas Álvaro Rodrigo Felisa Verdejo.

Third Recognizing Textual Entailment Challenge Potential SNeRG Submission.

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Assumption of Homoscedasticity

The Central Limit Theorem For simple random samples from any population with finite mean and variance, as n becomes increasingly large, the sampling distribution.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Chapter 15 Nonparametric Statistics

BPT 2423 – STATISTICAL PROCESS CONTROL.  Frequency Distribution  Normal Distribution / Probability  Areas Under The Normal Curve  Application of Normal.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.

Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Principles of Pattern Recognition

بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.

A Language Independent Method for Question Classification COLING 2004.

Wilcoxon rank sum test (or the Mann-Whitney U test) In statistics, the Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

1 Nonparametric Statistical Techniques Chapter 17.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Stochastic Unsupervised Learning on Unlabeled Data July 2, 2011 Presented by Jianjun Xie – CoreLogic Collaborated with Chuanren Liu, Yong Ge and Hui Xiong.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Chapter XIV Data Preparation and Basic Data Analysis.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

NTU & MSRA Ming-Feng Tsai

1 Chapter 10: Describing the Data Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?

Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Large-Scale Content-Based Audio Retrieval from Text Queries

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Mathematical Foundations of BME

ANalysis Of VAriance Lecture 1 Sections: 12.1 – 12.2

Presentation transcript:

Textual Entailment Using Univariate Density Model and Maximizing Discriminant Function “Third Recognizing Textual Entailment Challenge 2007 Submission” Scott Settembre University at Buffalo, SNePS Research Group

Third Recognizing Textual Entailment Challenge (RTE3) The task is to develop a system to determine if a given pair of sentences has the first sentence “entail” the second sentence The pair of sentences is called the Text-Hypothesis pair (or T-H pair) Participants are provided with 800 sample T-H pairs annotated with the correct entailment answers The final testing set consists of 800 non-annotated samples

Development set examples Example of a YES result As much as 200 mm of rain have been recorded in portions of British Columbia, on the west coast of Canada since Monday. British Columbia is located in Canada. Example of a NO result Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One. Blue Mountain Lumber owns Ernslaw One.

Entailment Task Types There are 4 different entailment tasks: –“IE” or Information Extraction Text: “An Afghan interpreter, employed by the United States, was also wounded.” Hypothesis: “An interpreter worked for Afghanistan.” –“IR” or Information Retrieval Text: “Catastrophic floods in Europe endanger lives and cause human tragedy as well as heavy economic losses” Hypothesis: “Flooding in Europe causes major economic losses.”

Entailment Task Types - continued The two remaining entailment tasks are: –“SUM” or Multi-document summarization Text: “Sheriff's officials said a robot could be put to use in Ventura County, where the bomb squad has responded to more than 40 calls this year.” Hypothesis: “Police use robots for bomb-handling.” –“QA” or Question Answering Text: “Israel's prime Minister, Ariel Sharon, visited Prague.” Hypothesis: “Ariel Sharon is the Israeli Prime Minister.”

Submission Results The two runs submitted this year (2007) scored: –%62.62 (501 correct out of 800) –%61.00 (488 correct out of 800) For the 2 nd RTE Challenge of 2006, a %62.62 ties for 4 th out of 23 teams. –Top scores were %75, %73, %64, and % –Median: %58.3 –Range: %50.88 to %75.38.

Main Focuses Create a process to pool expertise of our research group in addressing entailment –Development of specification for metrics –Import of metric vectors generated from other programs Design a visual environment to manage this process and manage development data set –Ability to select metric vectors and classifier to use –Randomization of off-training sets to prevent overfitting Provide a baseline to evaluate and compare different metrics and classification strategies

Development Environment RTE Development Environment –Display and examine the development data set

Development Environment - continued –Select off-training set from development data

Development Environment - continued –Select metric to use for classification

Metrics Metric specification –Continuous value and normalized between 0 and 1 (inclusive) Allows future use of nearest-neighbor classification techniques Prevents scaling issues –Preferably in a Gaussian distribution (bell curve) Metrics developed for our submission –Lexical similarity ratio (word bag) –Average Matched word displacement –Lexical similarity with synonym and antonym replacement

Metric - example Lexical similarity ratio (word bag ratio) –# of matches between text and hypothesis / # of words in hypothesis Works for: A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured. 30 die in a bus collision in Uganda. Wordbag ratio = 7 / 8 Fails for: Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One. Blue Mountain Lumber owns Ernslaw One. Wordbag ratio = 5 / 6 –Weakness: does not consider semantic information

Development Environment - continued –Classify testing data using Univariate normal model

Classifiers Two classification techniques were used –Univariate normal model (Gaussian density) –Linear discriminant function Univariate normal model –One classifier for each entailment type and value –8 classifiers are developed –Results from the “YES” and “NO” classifiers are compared Linear discriminant function –One classifier for each entailment type –4 classifiers are developed –Result based on which side of the boundary the metric is on

Classifiers - Univariate Each curve represents a probability density function –Calculated from the mean and variance of the “YES” and “NO” metrics from the training set To evaluate, calculate a metric’s position on either curve –Use the Gaussian density function –Classify to category with the largest p(x) x p(x) NoYes

Classifiers - Simple Linear Discriminant Find a boundary that maximizes result –Very simple for a single metric –Brute force search can be used for good approximation x

Classifiers - Weaknesses Univariate normal weakness –Useless when there is a high overlap of metric values for each category (when mean is very close) –Or metrics are not distributed on a Gaussian “bell” curve Simple linear discriminant weaknesses –Processes 1 metric in training vector –Placed a constraint on metric values (0 for no entailment, 1 for max entailment) OverlapNon Gaussian distribution

Development Environment - continued –Examine results and compare various metrics

Results Combined each classification technique with each metric –Based on training results, the classifier/metric combination was selected for use in challenge submission Wordbag + Univariate Syn/Anto + Univariate Wordbag + Linear Dis Syn/Anto + Linear Dis Overall IE IR QA SUM Training Results Wordbag + Univariate Syn/Anto + Univariate Wordbag + Linear Dis Syn/Anto + Linear Dis Overall IE IR QA SUM Final results from competition set

Future Enhancements Use of multivariate model to process metric vector –Ability to use more than one metric at a time to classify Add more metrics that consider semantics –Examination of incorrect answers show that a modest effort to process semantic information would yield better results –Current metrics only use lexical similarity Increase ability for tool to interface in other ways –Currently we can process metrics from Matlab, COM and.NET objects, and pre-processed metric vector files

RTE Challenge - Final Notes See our progress at: RTE Web Site: Textual Entailment resource pool: Actual ranking released in June 2007 at: April 13, 2007CSEGSA ConferenceScott Settembre