Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin.

Slides:



Advertisements
Similar presentations
parity bit is 1: data should have an odd number of 1's
Advertisements

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
APT0203Lecture 41 Proposed Architecture GIS Map Query Path Planner Sentence Plan Message Based Plan Path Based Plan Text PlannerSentence Planner Text Realisation.
Information Retrieval in Practice
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Intelligent Information Retrieval CS 336 –Lecture 2: Query Language Xiaoyan Li Spring 2006 Modified from Lisa Ballesteros’s slides.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Areas of difficulty in the curriculum!.  1. Number  2. Algebra  3. Shape and Space  4. Measures  5. Data.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Standard Scores & Correlation. Review A frequency curve either normal or otherwise is simply a line graph of all frequency of scores earned in a data.
Overview of Search Engines
ALGEBRA, EQUATIONS AND FORMULAE. INTRODUCTION  Algebra essentially involves the substitution of letters for numbers in calculations, so that we can establish.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Motion The motion of an object can be described by its position, direction of movement, and speed. How do we know this flag is in motion?
Mathematics Parent Workshop Monday 3 rd February 2014 (3-3:45pm) Specific areas include essential skills and knowledge for children to participate successfully.
Software Metrics  The measurement of a particular characteristic of a software program's performance or efficiency. (
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Data analysis – Spearman’s Rank 1.Know what Spearman’s rank is and how to use it 2.Be able to produce a Spearman’s rank correlation graph for your results.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
Offsite Detour analysis. Calculation of detour length Distance along SR = Distance along SR
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
Constants Numeric Constants Integer Constants Floating Point Constants Character Constants Expressions Arithmetic Operators Assignment Operators Relational.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Preparing To Automate Data Management Chapter 1 “You.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Playing Biology ’ s Name Game: Identifying Protein Names In Scientific Text Daniel Hanisch, Juliane Fluck, Heinz-Theodor Mevissen and Ralf Zimmer Pac Symp.
Clearly Visual Basic: Programming with Visual Basic 2008 Chapter 24 The String Section.
1 CSE 2337 Introduction to Data Management Access Book – Ch 1.
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Statistical Quality Control/Statistical Process Control
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
1 AQA ICT AS Level © Nelson Thornes 2008 Good quality data and information Data terms.
A table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows.
Chapter 1 Preview Objectives Physics The Scientific Method Models
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
APA NJ APA Teacher Training 2 What is the Purpose of the APA? To measure performance of students with the most significant cognitive disabilities.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Hamming (4,7) Code Binary Linear Codes Hamming Distance Weight of BLC
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Patterns and Expressions Lesson 1-1
Algebra 2 Properties of Real Numbers Lesson 1-2 Goals Goal To graph and order real numbers. To Identity properties of real numbers. Rubric Level 1 –
Information Retrieval in Practice
Year 9 Mathematics Algebra and Sequences
Operations and Algebraic Thinking
Welcome to our Mathematics Workshop Jo MacRitchie, Melissa Ward and Cecilia Del Corso.
Chapter 6 Indexes, Scales, And Typologies
College Physics Chapter 1 Introduction.
Chunk Parsing CS1573: AI Application Development, Spring 2003
PolyAnalyst Web Report Training
Recommendation Systems
1.1 Variables Objective: Students will be able to substitute numbers in for variables and evaluate problems. Students must demonstrate they know how to.
Topic: Semantic Text Mining
Presentation transcript:

Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin

Essentials of Approach A certain shift from deep text analysis and NLP methods to surface techniques Use of formulas describing the structure of strings likely bearing certain semantic information

Example FBI Director Louis Freeh A person represented by his/her first/last names A person occupies a post in an organization

The formula A word composed of capital letters An item from a list of posts in an organization An item from a list of first names A capitalized word

Patterns Formulas of such kind were called “patterns” First used at TREC-10 QA track Each pattern is characterized by a certain generalized semantics

Steps (Overview) Identify strings corresponding to a formula Identify the question terms (types) Check for expressions negating the semantics of the found strings Apply the set of formulas (for a particular question type) to match the strings in question-relevant passages

A Surface Approach No need to distinguish linguistic entities Formulas for strings look like regular expressions But patterns include elements referring to lists of predefined words/phrases

Patterns and Question Types Who is person X? Who occupies post Y in organization Z?  A relationship is established between 2 or more entities: person, post, organization etc Where-question:  suggest geographical items as answers  Construct formulas like: item from list of cities/towns/counties, countries/states.

Examples ”In what year” – questions  Find strings with a sequence of 4 digits Questions regarding length, area, weight, speed, etc  Digits plus units of measurement “What is the area of Venezuela?”  340,569 square miles (a simple pattern match)

Complex Patterns Strings expressing relationship between several semantic entities The more complex a pattern is, the higher its reliability

Names and Dates People Names  Items from first name list  Capitalized words  Specific name elements (bin, van, etc)  Abbreviations like Sr. and Jr. Dates  Prepositions, articles, digits, month names, commas, dashes, brackets, phrases like “early,” “in the period of,” “years ago,” “B.C.”

Pattern-Matching Strings and Question Semantics How question words are located in the pattern- matching string (distance, left/right, position to other matching strings etc) Simplicity of a pattern’s structure is compensated by complexity of rules Without applying heuristic rules, sufficiently reliable results cannot be ensured Rank assigned to question words/phrases and score assigned to candidate answers

QA Process Define question types for all questions Order the questions with more reliable patterns Form and rank queries from question terms Modify queries (if score is below threshold) Identify pattern-matching strings (apply complex and then simple) Check correlation between patterns and question semantics Identify exact answers and calculate their scores

Analysis of Results TREC 2002:  confidence-weighted score =  271 right answers, 209 wrong answers, 148 “no answer”  First 29 correct answers belonged to question types with highly reliable patterns  Incorrectly identified answer strings = 13.6% (excluding NIL answers)