Information Extraction Lecture 3 – Rule-based Named Entity Recognition Dr. Alexander Fraser, U. Munich September 3rd, 2014 ISSALE: University of Colombo.

Slides:



Advertisements
Similar presentations
1 Modeling Query-Based Access to Text Databases Eugene Agichtein Panagiotis Ipeirotis Luis Gravano Computer Science Department Columbia University.
Advertisements

Chapter 8 Technicalities: Functions, etc. Bjarne Stroustrup
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
University of Sheffield NLP Module 4: Machine Learning.
Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Information Extraction Lecture 3 – Rule-based Named Entity Recognition CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Information Extraction Lecture 4 – Named Entity Recognition II CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Modifying existing content Adding/Removing content on a page using jQuery.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Search Engines and Information Retrieval
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Getting an Experimental Idea Psych 231: Research Methods in Psychology.
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006.
Introduction to Machine Learning Approach Lecture 5.
Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.
Evaluating Classifiers
Yeshivah of Flatbush After-School Math Enrichment 2009 Jerry B. Altzman, Ph.D.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
CLassification TESTING Testing classifier accuracy
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
Search Engines and Information Retrieval Chapter 1.
Information Extraction Referatsthemen CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Sheffield -- Victims of Mad Cow Disease???? Or is it really possible to develop a named entity recognition system in 4 days on a surprise language with.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
STT 315 Ashwini maurya This lecture is based on Chapter 5.4 Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil.
Information Extraction Lecture 4 – Named Entity Recognition II CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
Information Extraction Referatsthemen CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Information Extraction Lecture 3 – Rule-based Named Entity Recognition CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
CSE 311 Foundations of Computing I Lecture 28 Computability: Other Undecidable Problems Autumn 2011 CSE 3111.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Information Retrieval (based on Jurafsky and Martin) Miriam Butt October 2003.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Welcome to Math 6 Our subject for today is… Divisibility.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Statistical Machine Translation Part II: Word Alignments and EM
Information Extraction Lecture 3 – Rule-based Named Entity Recognition
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
CS246: Information Retrieval
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Retrieval Performance Evaluation - Measures
Presentation transcript:

Information Extraction Lecture 3 – Rule-based Named Entity Recognition Dr. Alexander Fraser, U. Munich September 3rd, 2014 ISSALE: University of Colombo School of Computing

Outline Basic evaluation: Precision/Recall Rule-based Named Entity Recognition Learning Rules Issues in Evaluation Annotation for NER 2

Information Extraction Source Selection Tokenization& Normalization Named Entity Recognition Instance Extraction Fact Extraction Ontological Information Extraction ? 05/01/67  and beyond...married Elvis on Elvis Presleysinger Angela Merkel politician ✓ ✓ 3 Information Extraction (IE) is the process of extracting structured information from unstructured machine-readable documents Slide from Suchanek

Relation Extraction: Disease Outbreaks May , Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire, is finding itself hard pressed to cope with the crisis… DateDisease NameLocation Jan. 1995MalariaEthiopia July 1995Mad Cow DiseaseU.K. Feb. 1995PneumoniaU.S. May 1995EbolaZaire Information Extraction System Slide from Manning

Named Entity Recognition Named Entity Recognition (NER) is the process of finding entities (people, cities, organizations, dates,...) in a text. Elvis Presley was born in 1935 in East Tupelo, Mississippi. Slide from Suchanek

Evaluation How can the performance of a system be evaluated? Precision Recall F-measure (combination of Precision/Recall) Standard Methodology from Information Retrieval: Slide from Butt/Jurafsky/Martin

Recall Measure of how much relevant information the system has extracted (coverage of system). Basic idea: Recall = # of correct answers given by system total # of possible correct answers in text Slide from Butt/Jurafsky/Martin

Recall Measure of how much relevant information the system has extracted (coverage of system). Exact definition: Recall = 1 if no possible correct answers else: # of correct answers given by system total # of possible correct answers in text Slide modified from Butt/Jurafsky/Martin

Precision Measure of how much of the information the system returned is correct (accuracy). Basic idea: Precision = # of correct answers given by system # of answers given by system Slide from Butt/Jurafsky/Martin

Precision Measure of how much of the information the system returned is correct (accuracy). Exact definition: Precision = 1 if no answers given by system else: # of correct answers given by system # of answers given by system Slide modified from Butt/Jurafsky/Martin

Evaluation Every system, algorithm or theory should be evaluated, i.e. its output should be compared to the gold standard (i.e. the ideal output). Suppose we try to find scientists… Algorithm output: O = {Einstein, Bohr, Planck, Clinton, Obama} Gold standard: G = {Einstein, Bohr, Planck, Heisenberg} Precision: What proportion of the output is correct? | O ∧ G | |O| Recall: What proportion of the gold standard did we get? | O ∧ G | |G| ✓ ✓ ✓ ✗ ✗ ✓ ✓✓ ✗ Slide modified from Suchanek

Explorative Algorithms Explorative algorithms extract everything they find. Precision: What proportion of the output is correct? BAD Recall: What proportion of the gold standard did we get? GREAT (very low threshold) Algorithm output: O = {Einstein, Bohr, Planck, Clinton, Obama, Elvis,…} Gold standard: G = {Einstein, Bohr, Planck, Heisenberg} Slide from Suchanek

Conservative Algorithms Conservative algorithms extract only things about which they are very certain Precision: What proportion of the output is correct? GREAT Recall: What proportion of the gold standard did we get? BAD (very high threshold) Algorithm output: O = {Einstein} Gold standard: G = {Einstein, Bohr, Planck, Heisenberg} Slide from Suchanek

Precision & Recall Exercise What is the algorithm output, the gold standard, the precision and the recall in the following cases? 3.On a sample of Elvis Radio ™, 90% of songs are by Elvis. An algorithm learns to detect Elvis songs. Out of the100 songs on Elvis Radio, the algorithm says that 20 are by Elvis (and says nothing about the other 80). Out of these 20 songs, 15 were by Elvis and 5 were not. 1.Nostradamus predicts a trip to the moon for every century from the 15 th to the 20 th inclusive 2.The weather forecast predicts that the next 3 days will be sunny. It does not say anything about the 2 days that follow. In reality, it is sunny during all 5 days. output={e1,…,e15, x1,…,x5} gold={e1,…,e90} prec=15/20=75 %, rec=15/90=16% Modified from Suchanek

F1- Measure You can’t get it all... 1 Recall Precision 1 0 The F1-measure combines precision and recall as the harmonic mean: F1 = 2 * precision * recall / (precision + recall) Slide from Suchanek

F-measure Precision and Recall stand in opposition to one another. As precision goes up, recall usually goes down (and vice versa). F-measure = (ß 2 +1)PR ß 2 P+R The F-measure combines the two values. When ß = 1, precision and recall are weighted equally (same as F1). When ß is > 1, precision is favored. When ß is < 1, recall is favored. Slide modified from Butt/Jurafsky/Martin

Summary: Precision/Recall Precision and recall are very key concepts –Definitely know these formulas, they are applicable everywhere (even real life)! F-Measure is a nice way to combine them to get a single number –People sometimes don't specify Beta when they say F-Measure –In this case Beta=1, i.e., they mean F1, equal weighting of P and R We will return to evaluation in more detail later in this lecture Now let's look at rules for (open-class) NER

Discussion Multi-entity rules are typically used when there is a lot of structure Single-entity rules are often used when manually writing rules – Humans are good at creating general rules from a limited number of examples Boundary rules are often used in learning approaches – They generalize well from few examples For instance, they can use rules for and that are learned from different training examples But they may overgeneralize! Slide modified from Ciravegna

Rule-based NER Through about 2000, handcrafted rule-based NER was better than statistical NER – For instance in the Message Understanding Conferences, which featured shared tasks Since 2000, statistical approaches have started to dominate the academic literature In industry, there is still diversity – High precision -> rule-based – High recall -> statistical – Between, many different solutions (including combining both approaches) – But it (debatably) takes less effort to tune statistical systems to improve precision than to tune rule-based systems to increase recall

Learning Rules We will now talk about learning rules – Still closely following Sarawagi Chapter 2 The key resource required is a gold standard annotated corpus – This is referred to as the "training" corpus – The system "learns" through training – The goal is to learn rules which may generalize well to new examples which were not seen during training We will discuss bottom-up and top-down creation of rules

Overfitting and Overgeneralization One key concept here is "overfitting" examples – What is meant here is that we memorize too much from one example – For instance, if we have: and we memorize that in this exact context Elvis Presley is a person, we are failing to generalize to other contexts We can also "overgeneralize" – An example would be to learn that the first word of a sentence is a first name This is true in this sentence But this rule will apply to every sentence, and often be wrong Elvis Presley was born in 1935 in East Tupelo, Mississippi.

Slide from Ciravegna

There are many papers on rule-based NER – Wikipedia also has a useful survey which I recommend Now we will return to evaluation – First a simple discussion of precision/recall as actually used – Then further issues Finally, we will discuss annotation of training sets

Slide sources – Many of the slides today were from Fabio Ciravegna, University of Sheffield

Thank you for your attention! 57