Outline Super-quick review of previous talk More on NER by token-tagging –Limitations of HMMs –MEMMs for sequential classification Review of relation extraction.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.
Information Retrieval in Practice
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Introduction to Text Mining
Overview of Search Engines
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
School of Engineering and Computer Science Victoria University of Wellington COMP423 Intelligent agents.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Review: Hidden Markov Models Efficient dynamic programming algorithms exist for –Finding Pr(S) –The highest probability path P that maximizes Pr(S,P) (Viterbi)
Information Extraction Yunyao Li EECS /SI /29/2006.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Dwar Ev ceremoniously soldered the final connection with gold. The eyes of a dozen television cameras watched him and the subether bore throughout the.
Learning To Understand Web Site Update Requests William W. Cohen, Einat Minkov, Anthony Tomasic.
CSE 5539: Web Information Extraction
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Some Work on Information Extraction at IRL Ganesh Ramakrishnan IBM India Research Lab.
Presenter: Shanshan Lu 03/04/2010
Mining Reference Tables for Automatic Text Segmentation Eugene Agichtein Columbia University Venkatesh Ganti Microsoft Research.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Systems Analysis and Design in a Changing World, Fourth Edition
Dwar Ev ceremoniously soldered the final connection with gold. The eyes of a dozen television cameras watched him and the subether bore throughout the.
India Research Lab © Copyright IBM Corporation 2006 Entity Annotation using operations on the Inverted Index Ganesh Ramakrishnan, with Sreeram Balakrishnan.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:
Conditional Markov Models: MaxEnt Tagging and MEMMs William W. Cohen CALD.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Dwar Ev ceremoniously soldered the final connection with gold. The eyes of a dozen television cameras watched him and the subether bore throughout the.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Introduction to “Event Extraction” Jan 18, What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Managing Data Resources File Organization and databases for business information systems.
Information Retrieval in Practice
Conditional Random Fields
Social Knowledge Mining
First assignment: due Today
CSCI 5832 Natural Language Processing
CS246: Information Retrieval
WHIRL – Reasoning with IE output
Presentation transcript:

Outline Super-quick review of previous talk More on NER by token-tagging –Limitations of HMMs –MEMMs for sequential classification Review of relation extraction techniques –Decomposition one: NER + segmentation + classifying segments and entities –Decomposition two: NER + segmentation + classifying pairs of entities Some case studies –ACE –Webmaster

Quick review of previous talk

What is “Information Extraction” Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation NAME TITLE ORGANIZATION Bill Gates CEOMicrosoft Bill Veghte VP Microsoft RichardStallman founder Free Soft.. * * * *

What is “Information Extraction” Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation NAME TITLE ORGANIZATION Bill Gates CEOMicrosoft Bill Veghte VP Microsoft RichardStallman founder Free Soft.. * * * * via token tagging

Token tagging and NER

NER by tagging tokens Yesterday Pedro Domingos flew to New York. Yesterday Pedro Domingos flew to New York Person name: Pedro Domingos Location name: New York Given a sentence: 2) Identify names based on the entity labels person name location name background 1) Break the sentence into tokens, and classify each token with a label indicating what sort of entity it’s part of: 3) To learn an NER system, use YFCL.

HMM for Segmentation of Addresses Simplest HMM Architecture: One state per entity type CA0.15 NY0.11 PA0.08 …… Hall0.15 Wean0.03 N-S0.02 …… [Pilfered from Sunita Sarawagi, IIT/Bombay]

HMMs for Information Extraction 1.The HMM consists of two probability tables Pr(currentState=s|previousState=t) for s=background, location, speaker, Pr(currentWord=w|currentState=s) for s=background, location, … 2.Estimate these tables with a (smoothed) CPT Prob(location|location) = #(loc->loc)/#(loc->*) transitions 3.Given a new sentence, find the most likely sequence of hidden states using Viterbi method: MaxProb(curr=s|position k)= Max state t MaxProb(curr=t|position=k-1) * Prob(word=w k-1 |t)*Prob(curr=s|prev=t) 00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun … …

“Naïve Bayes” Sliding Window vs HMMs GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. Domain: CMU UseNet Seminar Announcements FieldF1 Speaker:30% Location:61% Start Time:98% FieldF1 Speaker:77% Location:79% Start Time:98%

Design decisions: What are the output symbols (states) ? What are the input symbols ? Cohen => “Cohen”, “cohen”, “Xxxxx”, “Xx”, … ? 8217 => “8217”, “9999”, “9+”, “number”, … ? Sarawagi et al: choose best abstraction level using holdout set

What is a symbol? Ideally we would like to use many, arbitrary, overlapping features of words. S t-1 S t O t S t+1 O t +1 O t - 1 identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor … … … part of noun phrase is “Wisniewski” ends in “-ski” Lots of learning systems are not confounded by multiple, non- independent features: decision trees, neural nets, SVMs, …

What is a symbol? S t-1 S t O t S t+1 O t +1 O t - 1 identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor … … … part of noun phrase is “Wisniewski” ends in “-ski” Idea: replace generative model in HMM with a maxent model, where state depends on observations

What is a symbol? S t-1 S t O t S t+1 O t +1 O t - 1 identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor … … … part of noun phrase is “Wisniewski” ends in “-ski” Idea: replace generative model in HMM with a maxent model, where state depends on observations and previous state

What is a symbol? S t-1 S t O t S t+1 O t +1 O t - 1 identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor … … … part of noun phrase is “Wisniewski” ends in “-ski” Idea: replace generative model in HMM with a maxent model, where state depends on observations and previous state history

Ratnaparkhi’s MXPOST Sequential learning problem: predict POS tags of words. Uses MaxEnt model described above. Rich feature set. To smooth, discard features occurring < 10 times.

Conditional Markov Models (CMMs) aka MEMMs aka Maxent Taggers vs HMMS S t-1 StSt OtOt S t+1 O t+1 O t-1... S t-1 StSt OtOt S t+1 O t+1 O t-1...

Extracting Relationships

What is “Information Extraction” Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation NAME TITLE ORGANIZATION Bill Gates CEOMicrosoft Bill Veghte VP Microsoft RichardStallman founder Free Soft.. * * * *

What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: 23rd July :51 GMT Microsoft was in violation of the GPL (General Public License) on the Hyper-V code it released to open source this week. After Redmond covered itself in glory by opening up the code, it now looks like it may have acted simply to head off any potentially embarrassing legal dispute over violation of the GPL. The rest was theater. As revealed by Stephen Hemminger - a principal engineer with open-source network vendor Vyatta - a network driver in Microsoft's Hyper-V used open-source components licensed under the GPL and statically linked to binary parts. The GPL does not permit the mixing of closed and open- source elements. … Hemminger said he uncovered the apparent violation and contacted Linux Driver Project lead Greg Kroah-Hartman, a Novell programmer, to resolve the problem quietly with Microsoft. Hemminger apparently hoped to leverage Novell's interoperability relationship with Microsoft. NAME TITLE ORGANIZATION Stephen Hemminger Greg Kroah-Hartman Vyatta Novell Linux Driver Proj. principal engineer programmer lead

What is “Information Extraction” NER + Segment + Classify Segments and Entities Technique 1: 23rd July :51 GMT Microsoft was in violation of the GPL (General Public License) on the Hyper-V code it released to open source this week. After Redmond covered itself in glory by opening up the code, it now looks like it may have acted simply to head off any potentially embarrassing legal dispute over violation of the GPL. The rest was theater. As revealed by Stephen Hemminger - a principal engineer with open-source network vendor Vyatta - a network driver in Microsoft's Hyper-V used open-source components licensed under the GPL and statically linked to binary parts. The GPL does not permit the mixing of closed and open- source elements. … Hemminger said he uncovered the apparent violation and contacted Linux Driver Project lead Greg Kroah-Hartman, a Novell programmer, to resolve the problem quietly with Microsoft. Hemminger apparently hoped to leverage Novell's interoperability relationship with Microsoft.

What is “Information Extraction” NER + Segment + Classify Segments and Entities Technique 1: 23rd July :51 GMT Microsoft was in violation of the GPL (General Public License) on the Hyper-V code it released to open source this week. After Redmond covered itself in glory by opening up the code, it now looks like it may have acted simply to head off any potentially embarrassing legal dispute over violation of the GPL. The rest was theater. As revealed by Stephen Hemminger - a principal engineer with open-source network vendor Vyatta - a network driver in Microsoft's Hyper-V used open-source components licensed under the GPL and statically linked to binary parts. The GPL does not permit the mixing of closed and open- source elements. … Hemminger said he uncovered the apparent violation and contacted Linux Driver Project lead Greg Kroah-Hartman, a Novell programmer, to resolve the problem quietly with Microsoft. Hemminger apparently hoped to leverage Novell's interoperability relationship with Microsoft.

What is “Information Extraction” NER + Segment + Classify Segments and Entities Technique 1: 23rd July :51 GMT Microsoft was in violation of the GPL (General Public License) on the Hyper-V code it released to open source this week. After Redmond covered itself in glory by opening up the code, it now looks like it may have acted simply to head off any potentially embarrassing legal dispute over violation of the GPL. The rest was theater. As revealed by Stephen Hemminger - a principal engineer with open-source network vendor Vyatta - a network driver in Microsoft's Hyper-V used open-source components licensed under the GPL and statically linked to binary parts. The GPL does not permit the mixing of closed and open- source elements. … Hemminger said he uncovered the apparent violation and contacted Linux Driver Project lead Greg Kroah-Hartman, a Novell programmer, to resolve the problem quietly with Microsoft. Hemminger apparently hoped to leverage Novell's interoperability relationship with Microsoft. Does not contain worksAt fact Does contain worksAt fact Does not contain worksAt fact Does contain worksAt fact

What is “Information Extraction” NER + Segment + Classify Segments and Entities Technique 1: As revealed by Stephen Hemminger - a principal engineer with open-source network vendor Vyatta - a network driver in Microsoft's Hyper-V used open-source components licensed under the GPL and statically linked to binary parts. The GPL does not permit the mixing of closed and open- source elements. … Does contain worksAt fact Stephen Hemminger principal engineer Microsoft Vyatta Is in the worksAt fact Is not in the worksAt fact NAME TITLE ORGANIZATION Stephen HemmingerVyatta principal engineer

What is “Information Extraction” NER + Segment + Classify Segments and Entities Technique 1: Because of Stephen Hemminger’s discovery, Vyatta was soon purchased by Microsoft for $1.5 billion… Does contain an acquired fact Microsoft Vyatta Is in the acquired fact: role=acquiree Is in a acquired fact: role=acquirer Stephen Hemminger Is not in the acquired fact $1.5 billion Is in the acquired fact: role=price

What is “Information Extraction” NER + Segment + Classify Segments and Entities Technique 1: 23rd July :51 GMT Hemminger said he uncovered the apparent violation and contacted Linux Driver Project lead Greg Kroah-Hartman, a Novell programmer, to resolve the problem quietly with Microsoft. Hemminger apparently hoped to leverage Novell's interoperability relationship with Microsoft. Does contain worksAt fact (actually two of them) - and that’s a problem

What is “Information Extraction” NER + Segment + Classify EntityPairs from same segment Technique 2: 23rd July :51 GMT Hemminger said he uncovered the apparent violation and contacted Linux Driver Project lead Greg Kroah-Hartman, a Novell programmer, to resolve the problem quietly with Microsoft. Hemminger apparently hoped to leverage Novell's interoperability relationship with Microsoft. Hemminger programmer Microsoft Novell Greg Kroah-Hartman Linux Driver Project lead

ACE: Automatic Content Extraction A case study, or: yet another NIST bake-off

About ACE and The five year mission: “develop technology to extract and characterize meaning in human language”…in newswire text, speech, and images –EDT: Develop NER for: people, organizations, geo-political entities (GPE), location, facility, vehicle, weapon, time, value … plus subtypes (e.g., educational organizations) –RDC: identify relation between entities: located, near, part-whole, membership, citizenship, … –EDC: identify events like interaction, movement, transfer, creation, destruction and their arguments

… and their arguments (entities)

Event = sentence plus a trigger word

Events, entities and mentions In ACE there is a distinction between an entity—a thing that exists in the Real World—and an entity mention—which is something that exists in the text (a substring). Likewise, and event is something that (will, might, or did) happen in the Real World, and an event mention is some text that refers to that event. –An event mention lives inside a sentence (the “extent”) with a “trigger” (or anchor) –An event mention is defined by its type and subtype (e.g, Life:Marry, Transaction:TransferMoney) and its arguments –Every argument is an entity mention that has been assigned a role. –Arguments belong to the same event if they are associated with the same trigger. The entity-mention, trigger, extent, argument are markup and also define a possible decomposition of the event- extraction task into subtask.

How to find events? The approach in Ahn, “Stages of event extraction”: Find all the entities and sentences –Ahn uses ground-truth labels for entities (an ACE thing) –Entities=candidate event arguments; sentences=candidate event extents; these will be classified and paired up Find the event mentions –Classify words as anchors (for event type T:S) or not: 35 classes, mostly None –Classify (event-mention,entity-mention) pairs as arguments (with role R) or not: 36 classes, mostly None Q: Why not just classify entity-mentions by Role? –Classify event mentions by modality, polarity, … –Classify (event-mention i,event-mention j ) pairs as co-referent or not. For details: see the paper!

The Webmaster Project: A Case Study with Einat Minkov (LTI, now Haifa U), Anthony Tomasic (ISRI) See IJCAI-2005 paper

Overview and Motivations What’s new : –Adaptive NLP components –Learn to adapt to changes in domain of discourse –Deep analysis in limited but evolving domain Compared to past NLP systems : –Deep analysis in narrow domain (Chat-80, SHRDLU,...) –Shallow analysis in broad domain (POS taggers, NE recognizers, NP-chunkers,...) –Learning used as tool to develop non-adaptive NLP components Details : –Assume DB-backed website, where schema changes over time No other changes allowed (yet) –Interaction: User requests (via NL ) changes in factual content of website (assume update of one tuple) System analyzes request System presents preview page and editable form version of request Key points : –partial correctness is useful –user can verify correctness (vs case for DB queries, q/a,...) => source of training data...something in between...

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib Classification newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... NER database web page templates Update Request Construction preview page user-editable form version of request confirm? LEARNER offline training data User

Outline Training data/corpus –look at feasibility of learning the components that need to be adaptive, using a static corpus Analysis steps: –request type –entity recognition –role-based entity classification –target relation finding –target attribute finding –[request building] Conclusions/summary

Training data User1 User2 User3.... Mike Roborts should be Micheal Roberts in the staff listing, pls fix it. Thanks - W On the staff page, change Mike to Michael in the listing for “Mike Roberts”.

Training data User1 User2 User3.... Add this as Greg Johnson’s phone number: Please add “ ” to greg johnson’s listing on the staff page.

Training data – entity names are made distinct User1 User2 User3.... Add this as Greg Johnson’s phone number: Please add “ ” to fred flintstone’s listing on the staff page. Modification: to make entity-extraction reasonable, remove duplicate entities by replacing them with alternatives (preserving case, typos, etc)

Training data User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3)....

Training data – always test on a novel user? User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3).... test train Simulate a distribution of many users (harder to learn)

Training data – always test on a novel request? User1 User2 User3 Request1 Request2 Request3 message(user 1,req 1) message(user 2,req 1).... message(user 1,req 3) message(user 2,req 3).... test train message(user 1,req 2) message(user 2,req 2).... Simulate a distribution of many requests (much harder to learn) 617 s total + 96 similar ones

Training data – limitations One DB schema, one off-line dataset –May differ from data collected on-line –So, no claims made for tasks where data will be substantially different (i.e., entity recognition) –No claims made about incremental learning/transfer All learning problems considered separate One step of request-building is trivial for the schema considered: –Given entity E and relation R, to which attribute of R does E correspond? –So, we assume this mapping is trivial (general case requires another entity classifier)

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Entity Extraction Results We assume a fixed set of entity types –no adaptivity needed (unclear if data can be collected) Evaluated: –hand-coded rules ( approx cascaded FST in “Mixup” language) –learned classifiers with standard feature set and also a “tuned” feature set, which Einat tweaked –results are in F1 (harmonic avg of recall and precision) –two learning methods, both based on “token tagging” Conditional Random Fields (CRF) Voted-perception discriminative training for an HMM (VP-HMM)

Entity Extraction Results – v2 (CV on users)

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Entity Classification Results Entity “roles”: –keyEntity: value used to retrieve a tuple that will be updated (“delete greg’s phone number”) –newEntity: value to be added to database (“William’s new office # is 5307 WH”). –oldEntity: value to be overwritten or deleted (“change mike to Michael in the listing for...”) –irrelevantEntity: not needed to build the request (“please add.... – thanks, William”) Features: closest preceding preposition closest preceding “action verb” (add, change, delete, remove,...) closest preceding word which is a preposition, action verb, or determiner (in “determined” NP) is entity followed by ‘s

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction Reasonable results with “bag of words” features.

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Request type classification: addTuple, alterValue, deleteTuple, or deleteValue? Can be determined from entity roles, except for deleteTuple and deleteValue. –“Delete the phone # for Scott” vs “Delete the row for Scott” Features : –counts of each entity role –action verbs –nouns in NPs which are (probably) objects of action verb –(optionally) same nouns, tagged with a dictionary Target attributes are similar Comments: Very little data is available Twelve words of schema-specific knowledge: dictionary of terms like phone, extension, room, office,...

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Training data User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3)....

Training data – always test on a novel user? User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3).... test train Simulate a distribution of many users (harder to learn)

Training data – always test on a novel request? User1 User2 User3 Request1 Request2 Request3 message(user 1,req 1) message(user 2,req 1).... message(user 1,req 3) message(user 2,req 3).... test train message(user 1,req 2) message(user 2,req 2).... Simulate a distribution of many requests (much harder to learn) 617 s total + 96 similar ones

Other issues: a large pool of users and/or requests usr

Webmaster: the punchline

Conclusions? System architecture allows all schema-dependent knowledge to be learned –Potential to adapt to changes in schema –Data needed for learning can be collected from user Learning appears to be possible on reasonable time-scales –10s or 100s of relevant examples, not thousands –Schema-independent linguistic knowledge is useful F1 is eighties is possible on almost all subtasks. –Counter-examples are rarely changed relations (budget) and distinctions for which little data is available There is substantial redundancy in different subtasks –Opportunity for learning suites of probabilistic classifiers, etc Even an imperfect IE system can be useful…. –With the right interface…

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction micro- form query DB

Webmaster: the Epilog (VIO) Faster for request-submitter Zero time for webmaster Zero latency More reliable (!) Tomasic et al, IUI 2006 Entity F1 ~= 84, Micro-form selection accuracy =~ 80 Used UI for experiments on real people (human-human, human-VIO)

Conclusions and comments Two case studies of non-trivial IE pipelines illustrate: –In any pipeline, errors propogate –What’s the right way of training components in a pipeline? Independently? How can (and when should) we make decisions using some flavor of joint inference? Some practical questions for pipeline components: –What’s downstream? What do errors cost? –Often we can’t see the end of the pipeline… How robust is the method ? –new users, new newswire sources, new upsteam components… –Do different learning methods/feature sets differ in robustness? Some concrete questions for learning relations between entities: –(When) is classifying pairs of things the right approach? How do you represent pairs of objects? How to you represent structure, like dependency parses? Kernels? Special features?