Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing & Information Extraction Dan Weld

Similar presentations


Presentation on theme: "Natural Language Processing & Information Extraction Dan Weld"— Presentation transcript:

1 Natural Language Processing & Information Extraction Dan Weld
10/30/2017 10/30/2017 Natural Language Processing & Information Extraction Dan Weld 1

2 10/30/2017 todo Be more explicit on how rule learning works (before rapier, bwi) – show search thru rule space Too much of a lecture – add more interaction / questions

3 Confession Proposal 1 Proposal 2 Keep homework and final exam
10/30/2017 Confession Proposal 1 Keep homework and final exam Teach next week so we can cover all the cool stuff Proposal 2 Talk quickly Mini-project replaces homework & final, due 12/11

4 10/30/2017 Mini-Project Options Write a program to solve the counterfeit coin problem on the midterm. Build a DPLL and or a WalkSAT satisfiability solver. Build a spam filter using naïve Bayes, decision trees, or compare learners in the Weka ML package. Write a program which learns Bayes nets.

5 10/30/2017 Communication

6 10/30/2017 Is the Problem

7 10/30/2017 Of the century… Herman is reprinted with permission from LaughingStock Licensing Inc., Ottawa Canada. All rights reserved.

8 10/30/2017

9 Why Study NLP? A hallmark of human intelligence.
10/30/2017 Why Study NLP? A hallmark of human intelligence. Text is the largest repository of human knowledge and is growing quickly. s, news articles, web pages, IM, scientific articles, insurance claims, customer complaint letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, ……

10 What is NLP? Natural Language Processing
10/30/2017 What is NLP? Natural Language Processing Understand human language (speech, text) Successfully communicate (NL generation) Pragmatics: “Can you pass the salt?” Semantics: “Paris is the capital of France” How represent meaning? Syntax: analyze the structure of a sentence Morphology: analyze the structure of words Dogs = dog + s.

11 Fundamental NLP Tasks Speech recognition: speech signal to words
10/30/2017 Fundamental NLP Tasks Speech recognition: speech signal to words Parsing: decompose sentence into units Part of speech tagging: Eg, noun vs. verb Semantic role labeling: for a verb, find the units that fill pre-defined “semantic roles” (eg, Agent, Patient or Instrument) Example: “John hit the ball with the bat”

12 10/30/2017 Fundamental NLP Tasks Semantics: map sentence to corresponding “meaning representation” (e.g., logic) give(John, Book, Sally) Quantification: Everybody loves somebody Word Sense Disambiguation orange juice vs. orange coat Where do word senses come from? Co-reference resolution: The dog found a cookie; He ate it. Implicit “text” - what is left unsaid? Joe entered the restaurant, ordered, ate and left. The owner said “Come back soon!” Joe entered the restaurant, ordered, ate and left. The owner said “Hey, I’ll call the police!”

13 NLP Applications Question answering Text Categorization/Routing
10/30/2017 NLP Applications Question answering Who is the first Taiwanese president? Text Categorization/Routing e.g., customer s. Text Mining Find everything that interacts with BRCA1. Machine Translation Information Extraction Spelling correction Is that just dictionary lookup?

14 Main Challenge in NLP: Ambiguity
10/30/2017 Lexical: Label (noun or verb)? London (Jack or capital of the UK)? Syntactic (examples from newspaper headlines): Prepositional Phrase Attachment Ban on Nude Dancing on Governor’s Desk Killer Sentenced to Die for Second Time in 10 Years Word Sense Disambiguation 1990: Iraqi Head Seeking Arms Syntactic Ambiguity (what’s modifying what) Juvenile Court to Try Shooting Defender. Semantic ambiguity: “snake poison” Rampant metaphors: “prices went up”

15 Why is Natural Language Understanding difficult?
10/30/2017 Why is Natural Language Understanding difficult? Hidden structure of language is highly ambiguous Many legal parse trees Structures for: Fed raises interest rates 0.5% in effort to control inflation (NYT headline 5/17/00)

16 Where are the ambiguities?
10/30/2017 Where are the ambiguities?

17 Pragmatics How do you map an utterance to the intended meaning?
10/30/2017 Pragmatics How do you map an utterance to the intended meaning? Do you have a standard set of transformations? Do you have a model of beliefs & agents? He knows that I can pass the salt, therefore… Can this be formulated as a learning problem?

18 Speech Act Theory (Searle ’75)
10/30/2017 Speech Act Theory (Searle ’75) Example Speech Acts: assertives = speech acts that commit a speaker to the truth of the expressed proposition directives = speech acts that are to cause the hearer to take a particular action, e.g. requests, commands and advice commissives = speech acts that commit a speaker to some future action, e.g. promises and oaths “Could you pass the salt?” “Yes.” Research has focused on specifying semantics and analyzing primitives instead of implemented “speech actors”

19 Semantics How do you map a sentence to a semantic representation?
10/30/2017 Semantics How do you map a sentence to a semantic representation? What are the semantic primitives? Schank: 12 (or so) primitives (Conceptual dependency theory, 1972) The basis of natural language is conceptual. The conceptual level is interlingual, while the sentential level is language-specific. De-emphasize syntax Focus on semantic expectations as the basis for NLU Status: didn’t yield practical systems, out of favor, but..

20 Syntax is about Sentence Structures
10/30/2017 Syntax is about Sentence Structures Sentences have structures and are made up of constituents. The constituents are phrases. A phrase consists of a head and modifiers. The category of the head determines the category of the phrase e.g., a phrase headed by a noun is a noun phrase

21 Parsing Analyze the structure of a sentence
10/30/2017 Parsing Analyze the structure of a sentence The student put the book on the table D N V P NP PP VP S

22 Teacher strikes idle kids
10/30/2017 Alternative Parses Teacher strikes idle kids S S VP VP NP NP NP NP N N V N N V A N Teacher strikes idle kids Teacher strikes idle kids

23 How do you choose the parse?
10/30/2017 How do you choose the parse? Old days: syntactic rules, semantics. Since early 1980’s: statistical parsing. parsing as supervised learning Input: Tree bank of (sentence, tree) Output: PCFG Parser + lexical language model Parsing decisions depend on the words! Example: VP  V + NP + NP? Does the verb take two objects or one? ‘told Sue the story’ versus ‘invented the Internet’

24 Issues with Statistical Parsing
10/30/2017 Issues with Statistical Parsing Statistical parsers still make “plenty” of errors Tree banks are language specific Tree banks are genre specific Train on WSJ  fail on the Web standard distributional assumption Unsupervised, un-lexicalized parsers exist But performance is substantially weaker

25 How Difficult is Morphology?
10/30/2017 How Difficult is Morphology? Examples from Woods et. al. 2000 delegate (de + leg + ate) take the legs from caress (car + ess) female car cashier (cashy + er) more wealthy lacerate (lace + rate) speed of tatting ratify (rat + ify) infest with rodents infantry (infant + ry) childish behavior

26 What is “Information Extraction”
10/30/2017 10/30/2017 What is “Information Extraction” As a task: Filling slots in a database from sub-segments of text. October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION What is IE. As a task it is… Starting with some text… and a empty data base with a defined ontology of fields and records, Use the information in the text to fill the database. Slides from Cohen & McCallum 26

27 What is “Information Extraction”
10/30/2017 10/30/2017 What is “Information Extraction” As a task: Filling slots in a database from sub-segments of text. October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… IE NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. Slides from Cohen & McCallum 27

28 What is “Information Extraction”
10/30/2017 10/30/2017 What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + clustering + association October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Bill Veghte VP Richard Stallman founder Free Software Foundation As a family of techniques, is consists of Segmentation to find the beginning and ending boundaries of the text snippets that will go in the DB. Classification… Slides from Cohen & McCallum 28

29 What is “Information Extraction”
10/30/2017 10/30/2017 What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Bill Veghte VP Richard Stallman founder Free Software Foundation Slides from Cohen & McCallum 29

30 What is “Information Extraction”
10/30/2017 10/30/2017 What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Bill Veghte VP Richard Stallman founder Free Software Foundation Slides from Cohen & McCallum 30

31 What is “Information Extraction”
10/30/2017 What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Richard Stallman founder Free Soft.. * Microsoft Corporation CEO Bill Gates Microsoft Gates Bill Veghte VP Richard Stallman founder Free Software Foundation * * * Slides from Cohen & McCallum

32 IE IE in Context Create ontology Spider Filter by relevance Segment
10/30/2017 10/30/2017 IE in Context Create ontology Spider Filter by relevance Segment Classify Associate Cluster IE Database Load DB Document collection Train extraction models Query, Search Data mine Label training data Slides from Cohen & McCallum 32

33 IE History Pre-Web Mostly news articles
10/30/2017 10/30/2017 IE History Pre-Web Mostly news articles De Jong’s FRUMP [1982] Hand-built system to fill Schank-style “scripts” from news wire Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96] Most early work dominated by hand-built models E.g. SRI’s FASTUS, hand-built FSMs. But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then HMMs: Elkan [Leek ’97], BBN [Bikel et al ’98] Web AAAI ’94 Spring Symposium on “Software Agents” Much discussion of ML applied to Web. Maes, Mitchell, Etzioni. Tom Mitchell’s WebKB, ‘96 Build KB’s from the Web. Wrapper Induction First by hand, then ML: [Doorenbos ‘96], [Soderland ’96], [Kushmerick ’97],… ML… although this is an area where ML has not yet trounced the hand-built systems. In some of the latest evaluations, hand-built shared 1st place with a ML. Now many companies making a business from IE (from the Web): WasBang, Inxight, Intelliseek, ClearForest. Slides from Cohen & McCallum 33

34 Apple to Open Its First Retail Store
10/30/2017 10/30/2017 What makes IE from the Web Different? Less grammar, but more formatting & linking Newswire Web Apple to Open Its First Retail Store in New York City MACWORLD EXPO, NEW YORK--July 17, Apple's first retail store in New York City will open in Manhattan's SoHo district on Thursday, July 18 at 8:00 a.m. EDT. The SoHo store will be Apple's largest retail store to date and is a stunning example of Apple's commitment to offering customers the world's best computer shopping experience. "Fourteen months after opening our first retail store, our 31 stores are attracting over 100,000 visitors each week," said Steve Jobs, Apple's CEO. "We hope our SoHo store will surprise and delight both Mac and PC users who want to see everything the Mac can do to enhance their digital lifestyles." Traditionally, IE was done mostly on newswire: flowing, grammatical sentences with little or no formatting information. The Web is different: less grammar, more ungrammatical little snippets, but rich formatting and layout information. This directory structure… So lets consider the dimensions of the IE landscape. First some ways in which the problem space varies: The directory structure, link structure, formatting & layout of the Web is its own new grammar. Slides from Cohen & McCallum 34

35 Landscape of IE Tasks (1/4): Pattern Feature Domain
10/30/2017 10/30/2017 Landscape of IE Tasks (1/4): Pattern Feature Domain Text paragraphs without formatting Grammatical sentences and some formatting & links Astro Teller is the CEO and co-founder of BodyMedia. Astro holds a Ph.D. in Artificial Intelligence from Carnegie Mellon University, where he was inducted as a national Hertz fellow. His M.S. in symbolic and heuristic computation and B.S. in computer science are from Stanford University. His work in science, literature and business has appeared in international media from the New York Times to CNN to NPR. Non-grammatical snippets, rich formatting & links Tables Different families of features may be more or less useful in different data sets. Slides from Cohen & McCallum 35

36 Landscape of IE Tasks (2/4): Pattern Scope
10/30/2017 10/30/2017 Landscape of IE Tasks (2/4): Pattern Scope Web site specific Genre specific Wide, non-specific Formatting Layout Language Amazon.com Book Pages Resumes University Names If you were going to use formatting to do extraction, how often (with what granularity) would you have to re-train your models? Slides from Cohen & McCallum 36

37 Landscape of IE Tasks (3/4): Pattern Complexity
10/30/2017 10/30/2017 Landscape of IE Tasks (3/4): Pattern Complexity E.g. word patterns: Closed set Regular set U.S. states U.S. phone numbers He was born in Alabama… Phone: (413) The big Wyoming sky… The CALD main office can be reached at Ambiguous patterns, needing context + many sources of evidence Complex pattern U.S. postal addresses How complicated a modeling technique will you have to use? University of Arkansas P.O. Box 140 Hope, AR Person names …was among the six houses sold by Hope Feldman that year. Headquarters: 1128 Main Street, 4th Floor Cincinnati, Ohio 45210 Pawel Opalinski, Software Engineer at WhizBang Labs. Slides from Cohen & McCallum 37

38 Landscape of IE Tasks (4/4): Pattern Combinations
10/30/2017 10/30/2017 Landscape of IE Tasks (4/4): Pattern Combinations Jack Welch will retire as CEO of General Electric tomorrow. The top role at the Connecticut company will be filled by Jeffrey Immelt. Single entity Binary relationship N-ary record Person: Jack Welch Relation: Person-Title Person: Jack Welch Title: CEO Relation: Succession Company: General Electric Title: CEO Out: Jack Welsh In: Jeffrey Immelt Person: Jeffrey Immelt How many components of the output of your system? Single entity… It is more difficult to get high accuracy on the right than on the left, because each consituent has to be right, and errors compound. 90%^5=60% Relation: Company-Location Company: General Electric Location: Connecticut Location: Connecticut “Named entity” extraction Slides from Cohen & McCallum 38

39 Evaluation of Single Entity Extraction
10/30/2017 10/30/2017 Evaluation of Single Entity Extraction TRUTH: Michael Kearns and Sebastian Seung will start Monday’s tutorial, followed by Richard M. Karpe and Martin Cooke. PRED: Michael Kearns and Sebastian Seung will start Monday’s tutorial, followed by Richard M. Karpe and Martin Cooke. # correctly predicted segments Precision = = # predicted segments # correctly predicted segments Recall = = # true segments 1 F = Harmonic mean of Precision & Recall = ((1/P) + (1/R)) / 2 Slides from Cohen & McCallum 39

40 State of the Art Performance
10/30/2017 10/30/2017 State of the Art Performance Named entity recognition Person, Location, Organization, … F1 in high 80’s or low- to mid-90’s Binary relation extraction Contained-in (Location1, Location2) Member-of (Person1, Organization1) F1 in 60’s or 70’s or 80’s Wrapper induction Extremely accurate performance obtainable Human effort (~30min) required on each site Slides from Cohen & McCallum 40

41 Landscape of IE Techniques (1/1): Models
10/30/2017 Landscape of IE Techniques (1/1): Models Classify Pre-segmented Candidates Abraham Lincoln was born in Kentucky. Classifier which class? Lexicons Sliding Window Abraham Lincoln was born in Kentucky. Classifier which class? Try alternate window sizes: Abraham Lincoln was born in Kentucky. member? Alabama Alaska Wisconsin Wyoming Boundary Models Abraham Lincoln was born in Kentucky. Classifier which class? BEGIN END Finite State Machines Abraham Lincoln was born in Kentucky. Most likely state sequence? Context Free Grammars Abraham Lincoln was born in Kentucky. NNP V P NP PP VP S Most likely parse? …and beyond Any of these models can be used to capture words, formatting or both. Slides from Cohen & McCallum

42 Landscape: Our Focus Organized according to the last row.
10/30/2017 10/30/2017 Landscape: Our Focus Pattern complexity closed set regular complex ambiguous Pattern feature domain words words + formatting formatting Pattern scope site-specific genre-specific general Pattern combinations entity binary n-ary Organized according to the last row. Models lexicon regex window boundary FSM CFG Slides from Cohen & McCallum 42

43 10/30/2017 10/30/2017 Sliding Windows Slides from Cohen & McCallum 43

44 Extraction by Sliding Window
10/30/2017 10/30/2017 Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement Slides from Cohen & McCallum 44

45 Extraction by Sliding Window
10/30/2017 10/30/2017 Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement Slides from Cohen & McCallum 45

46 Extraction by Sliding Window
10/30/2017 10/30/2017 Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement Slides from Cohen & McCallum 46

47 Extraction by Sliding Window
10/30/2017 10/30/2017 Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement Slides from Cohen & McCallum 47

48 A “Naïve Bayes” Sliding Window Model
10/30/2017 10/30/2017 A “Naïve Bayes” Sliding Window Model [Freitag 1997] 00 : pm Place : Wean Hall Rm Speaker : Sebastian Thrun w t-m w t-1 w t w t+n w t+n+1 w t+n+m prefix contents suffix Estimate Pr(LOCATION|window) using Bayes rule Try all “reasonable” windows (vary length, position) Assume independence for length, prefix words, suffix words, content words Estimate from data quantities like: Pr(“Place” in prefix|LOCATION) If P(“Wean Hall Rm 5409” = LOCATION) is above some threshold, extract it. Other examples of sliding window: [Baluja et al 2000] (decision tree over individual words & their context) Slides from Cohen & McCallum 48

49 “Naïve Bayes” Sliding Window Results
10/30/2017 10/30/2017 “Naïve Bayes” Sliding Window Results Domain: CMU UseNet Seminar Announcements GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. Field F1 Person Name: 30% Location: 61% Start Time: 98% Slides from Cohen & McCallum 49

50 Realistic sliding-window-classifier IE
10/30/2017 10/30/2017 What windows to consider? all windows containing as many tokens as the shortest example, but no more tokens than the longest example How to represent a classifier? It might: Restrict the length of window; Restrict the vocabulary or formatting used before/after/inside window; Restrict the relative order of tokens, etc. Learning Method SRV: Top-Down Rule Learning [Frietag AAAI ‘98] Rapier: Bottom-Up [Califf & Mooney, AAAI ‘99] Insert propositional version basic SRV algorithm here Slides from Cohen & McCallum 50

51 Rapier: results – precision/recall
10/30/2017 10/30/2017 Rapier: results – precision/recall Slides from Cohen & McCallum 51

52 Rapier – results vs. SRV Slides from Cohen & McCallum 10/30/2017
52

53 Rule-learning approaches to sliding-window classification: Summary
10/30/2017 10/30/2017 Rule-learning approaches to sliding-window classification: Summary SRV, Rapier, and WHISK [Soderland KDD ‘97] Representations for classifiers allow restriction of the relationships between tokens, etc Representations are carefully chosen subsets of even more powerful representations based on logic programming (ILP and Prolog) Use of these “heavyweight” representations is complicated, but seems to pay off in results Can simpler representations for classifiers work? Slides from Cohen & McCallum 53

54 BWI: Learning to detect boundaries
10/30/2017 10/30/2017 BWI: Learning to detect boundaries [Freitag & Kushmerick, AAAI 2000] Another formulation: learn three probabilistic classifiers: START(i) = Prob( position i starts a field) END(j) = Prob( position j ends a field) LEN(k) = Prob( an extracted field has length k) Then score a possible extraction (i,j) by START(i) * END(j) * LEN(j-i) LEN(k) is estimated from a histogram Slides from Cohen & McCallum 54

55 BWI: Learning to detect boundaries
10/30/2017 10/30/2017 BWI: Learning to detect boundaries BWI uses boosting to find “detectors” for START and END Each weak detector has a BEFORE and AFTER pattern (on tokens before/after position i). Each “pattern” is a sequence of tokens and/or wildcards like: anyAlphabeticToken, anyNumber, … Weak learner for “patterns” uses greedy search (+ lookahead) to repeatedly extend a pair of empty BEFORE,AFTER patterns Slides from Cohen & McCallum 55

56 BWI: Learning to detect boundaries
10/30/2017 10/30/2017 BWI: Learning to detect boundaries Field F1NB Person Name: 30% Location: 61% Start Time: 98% F1BWI 69.3% 75.5% 99.6% Slides from Cohen & McCallum 56

57 Problems with Sliding Windows and Boundary Finders
10/30/2017 10/30/2017 Problems with Sliding Windows and Boundary Finders Decisions in neighboring parts of the input are made independently from each other. Naïve Bayes Sliding Window may predict a “seminar end time” before the “seminar start time”. It is possible for two overlapping windows to both be above threshold. In a Boundary-Finding system, left boundaries are laid down independently from right boundaries, and their pairing happens as a separate step. Slides from Cohen & McCallum 57

58 Rule Learning Bootstrap
10/30/2017 Bootstrap <class nameplural> “such as” <proper NP>+ Use names to label other pages Now have training data! Learn more rules: <proper NP>, mayor of <proper NP>

59 Hidden Markov Models Conditional Random Fields
10/30/2017 Hidden Markov Models Conditional Random Fields Finite State Machines Abraham Lincoln was born in Kentucky. Most likely state sequence?

60 Extraction: Named Entity  Binary Relations
10/30/2017 10/30/2017 Extraction: Named Entity  Binary Relations How Extend a Sliding Window Approach? 60

61 References Slides from Cohen & McCallum 10/30/2017 10/30/2017 61
[Bikel et al 1997] Bikel, D.; Miller, S.; Schwartz, R.; and Weischedel, R. Nymble: a high-performance learning name- finder. In Proceedings of ANLP’97, p [Califf & Mooney 1999], Califf, M.E.; Mooney, R.: Relational Learning of Pattern-Match Rules for Information Extraction, in Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99). [Cohen, Hurst, Jensen, 2002] Cohen, W.; Hurst, M.; Jensen, L.: A flexible learning system for wrapping tables and lists in HTML documents. Proceedings of The Eleventh International World Wide Web Conference (WWW-2002) [Cohen, Kautz, McAllester 2000] Cohen, W; Kautz, H.; McAllester, D.: Hardening soft information sources. Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000). [Cohen, 1998] Cohen, W.: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity, in Proceedings of ACM SIGMOD-98. [Cohen, 2000a] Cohen, W.: Data Integration using Similarity Joins and a Word-based Information Representation Language, ACM Transactions on Information Systems, 18(3). [Cohen, 2000b] Cohen, W. Automatically Extracting Features for Concept Learning from the Web, Machine Learning: Proceedings of the Seventeeth International Conference (ML-2000). [Collins & Singer 1999] Collins, M.; and Singer, Y. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999. [De Jong 1982] De Jong, G. An Overview of the FRUMP System. In: Lehnert, W. & Ringle, M. H. (eds), Strategies for Natural Language Processing. Larence Erlbaum, 1982, [Freitag 98] Freitag, D: Information extraction from HTML: application of a general machine learning approach, Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98). [Freitag, 1999], Freitag, D. Machine Learning for Information Extraction in Informal Domains. Ph.D. dissertation, Carnegie Mellon University. [Freitag 2000], Freitag, D: Machine Learning for Information Extraction in Informal Domains, Machine Learning 39(2/3): (2000). Freitag & Kushmerick, 1999] Freitag, D; Kushmerick, D.: Boosted Wrapper Induction. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99) [Freitag & McCallum 1999] Freitag, D. and McCallum, A. Information extraction using HMMs and shrinakge. In Proceedings AAAI-99 Workshop on Machine Learning for Information Extraction. AAAI Technical Report WS [Kushmerick, 2000] Kushmerick, N: Wrapper Induction: efficiency and expressiveness, Artificial Intelligence, 118(pp ). [Lafferty, McCallum & Pereira 2001] Lafferty, J.; McCallum, A.; and Pereira, F., Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, In Proceedings of ICML-2001. [Leek 1997] Leek, T. R. Information extraction using hidden Markov models. Master’s thesis. UC San Diego. [McCallum, Freitag & Pereira 2000] McCallum, A.; Freitag, D.; and Pereira. F., Maximum entropy Markov models for information extraction and segmentation, In Proceedings of ICML-2000 [Miller et al 2000] Miller, S.; Fox, H.; Ramshaw, L.; Weischedel, R. A Novel Use of Statistical Parsing to Extract Information from Text. Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL), p Slides from Cohen & McCallum 61

62 10/30/2017

63 10/30/2017 Introduction to NLP Oren Etzioni (CSE 473)

64 10/30/2017 Communication

65 10/30/2017 Is the Problem

66 10/30/2017 Of the century… Herman is reprinted with permission from LaughingStock Licensing Inc., Ottawa Canada. All rights reserved.

67 10/30/2017

68 Why Study NLP? A hallmark of human intelligence.
10/30/2017 Why Study NLP? A hallmark of human intelligence. Text is the largest repository of human knowledge and is growing quickly. s, news articles, web pages, IM, scientific articles, insurance claims, customer complaint letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, ……

69 References Russell & Norvig ch. 22 & 23
10/30/2017 References Russell & Norvig ch. 22 & 23 Daniel Jurafsky and James H. Martin, Speech and Language Processing Foundations of Statistical NLP (Manning and Schutze)

70 What is NLP? Natural Language Processing
10/30/2017 What is NLP? Natural Language Processing Understand human language (speech, text) Successfully communicate (NL generation) Pragmatics: “Can you pass the salt?” Semantics: “Paris is the capital of France” “show is white” iff what holds true in the world? What does ‘water’ refer to? On ‘twin earth’? Syntax: analyze the structure of a sentence Morphology: analyze the structure of words Dogs = dog + s.

71 Why is Natural Language Understanding difficult?
10/30/2017 Why is Natural Language Understanding difficult? The hidden structure of language is highly ambiguous Structures for: Fed raises interest rates 0.5% in effort to control inflation (NYT headline 5/17/00)

72 Where are the ambiguities?
10/30/2017 Where are the ambiguities?

73 Fundamental NLP Tasks Speech recognition: speech signal to words
10/30/2017 Fundamental NLP Tasks Speech recognition: speech signal to words Parsing: decompose sentence into units Part of speech tagging: Eg, noun vs. verb Semantic role labeling: for a verb, find the units that fill pre-defined “semantic roles” (eg, Agent, Patient or Instrument) Example: “John hit the ball with the bat”

74 10/30/2017 Fundamental NLP Tasks Semantics: map sentence to corresponding “meaning representation” (e.g., logic) give(John, Book, Sally) Quantification: Everybody loves somebody Word Sense Disambiguation orange juice vs. orange coat Where do word senses come from? Co-reference resolution: The dog found a cookie; He ate it. Implicit “text”: what is left unsaid?

75 NLP Applications Question answering Text Categorization/Routing
10/30/2017 NLP Applications Question answering Who is the first Taiwanese president? Text Categorization/Routing e.g., customer s. Text Mining Find everything that interacts with BRCA1. Machine Translation Information Extraction Spelling correction Is that just dictionary lookup?

76 10/30/2017

77 Challenges in NLP: Ambiguity
10/30/2017 Challenges in NLP: Ambiguity Lexical: label (noun or verb)? London (Jack or capital of the UK)? Syntactic: I smelled a wumpus in (2,2) Teacher Strikes Idle Kids Killer Sentenced to Die for Second Time in 10 Years Semantic ambiguity: “dog lover” Rampant metaphors: “prices went up”

78 Pragmatics How do you map an utterance to the intended meaning?
10/30/2017 Pragmatics How do you map an utterance to the intended meaning? Do you have a standard set of transformations? Do you have a model of beliefs & agents? He knows that I can pass the salt, therefore… Can this be formulated as a learning problem?

79 Speech Act Theory (Searle ’75)
10/30/2017 Speech Act Theory (Searle ’75) Example Speech Acts: assertives = speech acts that commit a speaker to the truth of the expressed proposition directives = speech acts that are to cause the hearer to take a particular action, e.g. requests, commands and advice commissives = speech acts that commit a speaker to some future action, e.g. promises and oaths Research has focused on specifying semantics and analyzing primitives instead of implemented “speech actors”

80 Semantics How do you map a sentence to a semantic representation?
10/30/2017 Semantics How do you map a sentence to a semantic representation? What are the semantic primitives? (Schank’s conceptual dependency theory, 1972) Schank: 12 (or so) primitives: The basis of natural language is conceptual. The conceptual level is interlingual while the sentential level is language-specific. De-emphasize syntax Focus on semantic expectations as the basis for NLU Status: didn’t yield practical systems, out of favor, but..

81 Syntax is about Sentence Structures
10/30/2017 Syntax is about Sentence Structures Sentences have structures and are made up of constituents. The constituents are phrases. A phrase consists of a head and modifiers. The category of the head determines the category of the phrase e.g., a phrase headed by a noun is a noun phrase

82 Parsing Analyze the structure of a sentence
10/30/2017 Parsing Analyze the structure of a sentence The student put the book on the table D N V P NP PP VP S

83 Alternative Parses Teacher strikes idle kids Teacher strikes idle kids
10/30/2017 Alternative Parses S S VP VP NP NP NP NP N N V N N V A N Teacher strikes idle kids Teacher strikes idle kids

84 How do you choose the parse?
10/30/2017 How do you choose the parse? Old days: syntactic rules, semantics. Since early 1980’s: statistical parsing. parsing as supervised learning input: Tree bank of (sentence, tree) output: PCFG Parser + lexical language model Parsing decisions depend on the words! Example: VP  V + NP + NP? Does the verb take two objects or one? ‘told Sue the story’ versus ‘invented the Internet’

85 Issues with Statistical Parsing
10/30/2017 Issues with Statistical Parsing Statistical parsers still make “plenty” of errors Have the report on my desk by Monday Tree banks are language specific Tree banks are genre specific Train on WSJ  fail on the Web standard distributional assumption Unsupervised, un-lexicalized parsers exist, but performance is substantially weaker

86 How Difficult is Morphology?
10/30/2017 How Difficult is Morphology? Examples from Woods et. al. 2000 delegate (de + leg + ate) take the legs from caress (car + ess) female car cashier (cashy + er) more wealthy lacerate (lace + rate) speed of tatting ratify (rat + ify) infest with rodents infantry (infant + ry) childish behavior

87 NLP Application Case Study: Machine Translation
10/30/2017 NLP Application Case Study: Machine Translation Focus is on translating single sentences using a probabilistic model Classical case of statistical NLP (both good & bad) Source sentence is E (English) Target sentence is F (French) What is P(F/E)?

88 Noisy Channel Model argmax over F, P(F/E)
10/30/2017 Noisy Channel Model argmax over F, P(F/E) P(F/E) = [P(E/F) * P(F)]/P(E) Ignore P(E). Why? P(F) = Language model of French acquired from a French corpus P(E/F) = “translation model” acquired from a sentence “aligned” English-French corpus What’s the point? Instead of estimating P(F/E) directly, we decompose the problem into find possible translations of E into French Rank them using a model of French

89 10/30/2017 Language Models To get a probability distribution over sentences, build an N-gram model P(go/let's) is high but P(lets'/go) is low Equation shows a bi-gram model, tri-grams are common N is a tradeoff based on corpus size, sparsity Used widely in NLP (e.g., speech)

90 Translation Model P(E/F)
10/30/2017 Translation Model P(E/F) Relies on corpus where French sentences paired with their English translations (E.g., European parliament proceedings) Corpus too sparse to translate whole sentences So again, decompose into N-grams P(the dog/le chien) = P(the/le) * P(dog/chien) How does this fail? word order: brown dog  chien brun one-to-many: house  a la maison Idioms (he “kicked the bucket”), metaphors, etc. How does this scale to 49,000,000 language pairs?

91 { { { Most statistical machine translation research
10/30/2017 Most statistical machine translation research has focused on a few high-resource languages (European, Chinese, Japanese, Arabic). Approximate Parallel Text Available (with English) (~200M words) Various Western European languages: parliamentary proceedings, govt documents (~30M words) u { { Bible/Koran/ Book of Mormon/ Dianetics (~1M words) Nothing/ Univ. Decl. Of Human Rights (~1K words) { Chinese Arabic Danish Serbian Uzbek Chechen Khmer French Italian Finnish Bengali

92 Conclusions NLP is harder than it might seem naively Many subtasks
10/30/2017 Conclusions NLP is harder than it might seem naively Many subtasks Statistical NLP is the dominant paradigm supervised learning corpus-based statistics (language models) Some important limitations in practice! NL “understanding” has received very little attention

93 Why is NLP Hard? Ambiguity Prepositional Phrase Attachment
10/30/2017 Why is NLP Hard? Ambiguity Prepositional Phrase Attachment Newspaper headline examples Ban on Nude Dancing on Governor’s Desk Word Sense Disambiguation 1990: Iraqi Head Seeking Arms Syntactic Ambiguity (what’s modfying what) Juvenile Court to Try Shooting Defender. Examples From Chris Manning

94 More Teachers Strikes Idle Kids; Stolen Painting Found by Tree.
10/30/2017 More Teachers Strikes Idle Kids; Stolen Painting Found by Tree. Local High School Dropouts Cut in Half; Red Tape Holds Up New Bridges; Clinton Wins on Budget, But More Lies Ahead; Hospitals are Sued by Seven Foot Doctors; Kids Make Nutritious Snacks; Minister Accused of Having Eight Wives in Jail.

95 10/30/2017 That the reason that they work for human beings is because natural language understanding essentially depends on making complex and subtle use of context, both the context of the rest of the sentence, and context of prior discourse and thing around you in the world. So for humans it’s obvious what people are saying. AI complete tasks

96 10/30/2017 Chart Parsing R&N p 800

97 10/30/2017 Small devices With a big monitor, humans can scan for the right information On a small screen, there’s hugely more value from a system that can show you what you want: phone number business hours summary “Call me at 11 to finalize this”

98 10/30/2017 NLP for IR/web search? It’s a no-brainer that NLP should be useful and used for web search (and IR in general): Search for ‘Jaguar’ the computer should know or ask whether you’re interested in big cats [scarce on the web], cars, or, perhaps a molecule geometry and solvation energy package, or a package for fast network I/O in Java Search for ‘Michael Jordan’ The basketballer or the machine learning guy? Search for laptop, don’t find notebook Google doesn’t even stem: Search for probabilistic model, and you don’t even match pages with probabilistic models.

99 10/30/2017 NLP for IR/web search? Word sense disambiguation technology generally works well (like text categorization) Synonyms can be found or listed Lots of people have been into fixing this e-Cyc had a beta version with Hotbot that disambiguated senses, and was going to go live in 2 months … 14 months ago Lots of startups: LingoMotors iPhrase “Traditional keyword search technology is hopelessly outdated”

100 10/30/2017 NLP for IR/web search? But in practice it’s an idea that hasn’t gotten much traction Correctly finding linguistic base forms is straightforward, but produces little advantage over crude stemming which just slightly over equivalence classes words Word sense disambiguation only helps on average in IR if over 90% accurate (Sanderson 1994), and that’s about where we are Syntactic phrases should help, but people have been able to get most of the mileage with “statistical phrases” – which have been aggressively integrated into systems recently

101 10/30/2017 NLP for IR/web search? People can easily scan among results (on their 21” monitor) … if you’re above the fold Much more progress has been made in link analysis, and use of anchor text, etc. Anchor text gives human-provided synonyms Link or click stream analysis gives a form of pragmatics: what do people find correct or important (in a default context) Focus on short, popular queries, news, etc. Using human intelligence always beats artificial intelligence

102 10/30/2017 NLP for IR/web search? Methods which use of rich ontologies, etc., can work very well for intranet search within a customer’s site (where anchor- text, link, and click patterns are much less relevant) But don’t really scale to the whole web Moral: it’s hard to beat keyword search for the task of general ad hoc document retrieval Conclusion: one should move up the food chain to tasks where finer grained understanding of meaning is needed

103 10/30/2017

104 10/30/2017 Product information

105 Product info C-net markets this information
10/30/2017 Product info C-net markets this information How do they get most of it? Phone calls Typing.

106 Inconsistency: digital cameras
10/30/2017 Inconsistency: digital cameras Image Capture Device: 1.68 million pixel 1/2-inch CCD sensor Image Capture Device Total Pixels Approx million Effective Pixels Approx million Image sensor Total Pixels: Approx million-pixel Imaging sensor Total Pixels: Approx million 1,688 (H) x 1,248 (V) CCD Total Pixels: Approx. 3,340,000 (2,140[H] x 1,560 [V] ) Effective Pixels: Approx. 3,240,000 (2,088 [H] x 1,550 [V] ) Recording Pixels: Approx. 3,145,000 (2,048 [H] x 1,536 [V] ) These all came off the same manufacturer’s website!! And this is a very technical domain. Try sofa beds.

107 Product information/ Comparison shopping, etc.
10/30/2017 Product information/ Comparison shopping, etc. Need to learn to extract info from online vendors Can exploit uniformity of layout, and (partial) knowledge of domain by querying with known products E.g., Jango Shopbot (Etzioni and Weld) Gives convenient aggregation of online content Bug: not popular with vendors A partial solution is for these tools to be personal agents rather than web services

108 Question answering from text
10/30/2017 Question answering from text TREC 8/9 QA competition: an idea originating from the IR community With massive collections of on-line documents, manual translation of knowledge is impractical: we want answers from textbases [cf. bioinformatics] Evaluated output is 5 answers of 50/250 byte snippets of text drawn from a 3 Gb text collection, and required to contain at least one concept of the semantic category of the expected answer type. (IR think. Suggests the use of named entity recognizers.) Get reciprocal points for highest correct answer.

109 Pasca and Harabagiu (200) show value of sophisticated NLP
10/30/2017 Pasca and Harabagiu (200) show value of sophisticated NLP Good IR is needed: paragraph retrieval based on SMART Large taxonomy of question types and expected answer types is crucial Statistical parser (modeled on Collins 1997) used to parse questions and relevant text for answers, and to build knowledge base Controlled query expansion loops (morphological, lexical synonyms, and semantic relations) are all important Answer ranking by simple ML method

110 Question Answering Example
10/30/2017 Question Answering Example How hot does the inside of an active volcano get? get(TEMPERATURE, inside(volcano(active))) “lava fragments belched out of the mountain were as hot as 300 degrees Fahrenheit” fragments(lava, TEMPERATURE(degrees(300)), belched(out, mountain)) volcano ISA mountain lava ISPARTOF volcano  lava inside volcano fragments of lava HAVEPROPERTIESOF lava The needed semantic information is in WordNet definitions, and was successfully translated into a form that can be used for rough ‘proofs’

111 10/30/2017 Conclusion Complete human-level natural language understanding is still a distant goal But there are now practical and usable partial NLU systems applicable to many problems An important design decision is in finding an appropriate match between (parts of) the application domain and the available methods But, used with care, statistical NLP methods have opened up new possibilities for high performance text understanding systems.

112 University of Washington
10/30/2017 NLP Research at the Turing Center Oren Etzioni Turing Center University of Washington

113 10/30/2017

114 Turing Center Foci Scale MT to 49,000,000 language pairs
10/30/2017 Turing Center Foci Scale MT to 49,000,000 language pairs 2,500,000 word translation graph P(V  F  C)? PanImages Accumulate knowledge from the Web A new paradigm for Web Search

115 Outline A New Paradigm for Search Open Information Extraction
10/30/2017 Outline A New Paradigm for Search Open Information Extraction Tractable Inference Conclusions

116 “The best way to predict the future is to invent it!”
10/30/2017 Web Search in 2020? Type key words into a search box? Social or “human powered” Search? The Semantic Web? What about our technology exponentials? “The best way to predict the future is to invent it!”

117 Machine Reading = Information Extraction (IE) + tractable inference
10/30/2017 Intelligent Search Instead of merely retrieving Web pages, read ‘em! Machine Reading = Information Extraction (IE) tractable inference IE(sentence) = who did what? speaker(Alon Halevy, UW) Inference = uncover implicit information Will Alon visit Seattle?

118 Application: Information Fusion
10/30/2017 Application: Information Fusion What kills bacteria? What west coast, nano-technology companies are hiring? Compare Obama’s “buzz” versus Hillary’s? What is a quiet, inexpensive, 4-star hotel in Vancouver?

119 Opinion Mining Opine (Popescu & Etzioni, EMNLP ’05)
10/30/2017 Opinion Mining Opine (Popescu & Etzioni, EMNLP ’05) IE(product reviews) Informative Abundant, but varied Textual Summarize reviews without any prior knowledge of product category

120 10/30/2017

121 10/30/2017

122 But “Reading” the Web is Tough
10/30/2017 But “Reading” the Web is Tough Traditional IE is narrow IE has been applied to small, homogenous corpora No parser achieves high accuracy No named-entity taggers No supervised learning How about semi-supervised learning?

123 Semi-Supervised Learning
10/30/2017 Semi-Supervised Learning Few hand-labeled examples  Limit on the number of concepts  Concepts are pre-specified  Problematic for the Web Alternative: self-supervised learning Learner discovers concepts on the fly Learner automatically labels examples per concept!

124 10/30/2017 2. Open IE = Self-supervised IE (Banko, Cafarella, Soderland, et. al, IJCAI ’07) Traditional IE Open IE Input: Corpus + Hand- labeled Data Corpus Relations: Specified in Advance Discovered Automatically Complexity: Text analysis: O(D * R) R relations Parser + Named- entity tagger O(D) D documents NP Chunker Precise, narrow, pre-specified info. Requires hand-tagged examples for every relation of interest Target concepts unspecified Single pass over the corpus “lightweight” linguistic technologies Small, homogeneous corpora Utilizes “narrow” and brittle technologies

125 Extractor Overview (Banko & Etzioni, ’08)
10/30/2017 Extractor Overview (Banko & Etzioni, ’08) Use a simple model of relationships in English to label extractions Bootstrap a general model of relationships in English sentences, encoded as a CRF Decompose each sentence into one or more (NP1, VP, NP2) “chunks” Use CRF model to retain relevant parts of each NP and VP. The extractor is relation-independent!

126 TextRunner Extraction
10/30/2017 TextRunner Extraction Extract Triple representing binary relation (Arg1, Relation, Arg2) from sentence. Internet powerhouse, EBay, was originally founded by Pierre Omidyar. (Ebay, Founded by, Pierre Omidyar)

127 Numerous Extraction Challenges
10/30/2017 Numerous Extraction Challenges Drop non-essential info: “was originally founded by”  founded by Retain key distinctions Ebay founded by Pierr ≠ Ebay founded Pierre Non-verb relationships “George Bush, president of the U.S…” Synonymy & aliasing Albert Einstein = Einstein ≠ Einstein Bros.

128 TextRunner (Web’s 1st Open IE system)
10/30/2017 TextRunner (Web’s 1st Open IE system) Self-Supervised Learner: automatically labels example extractions & learns an extractor Single-Pass Extractor: single pass over corpus, identifying extractions in each sentence Query Processor: indexes extractions enables queries at interactive speeds

129 10/30/2017 TextRunner Demo

130 10/30/2017

131 10/30/2017

132 Sample of 9 million Web Pages
10/30/2017 Sample of 9 million Web Pages Triples 11.3 million With Well-Formed Relation 9.3 million With Well-Formed Entities 7.8 million Abstract 6.8 million 79.2% correct Concrete 1.0 million 88.1% correct Concrete facts: (Oppenheimer, taught at, Berkeley) Abstract facts: (fruit, contain, vitamins)

133 3. Tractable Inference Much of textual information is implicit
10/30/2017 3. Tractable Inference Much of textual information is implicit Entity and predicate resolution Probability of correctness Composing facts to draw conclusions

134 P(X = M) ~ shared relations
10/30/2017 I. Entity Resolution Resolver (Yates & Etzioni, HLT ’07): determines synonymy based on relations found by TextRunner (cf. Pantel & Lin ‘01) (X, born in, 1941) (M, born in, 1941) (X, citizen of, US) (M, citizen of, US) (X, friend of, Joe) (M, friend of, Mary) P(X = M) ~ shared relations

135 Relation Synonymy P(R = R’) ~ shared argument pairs
10/30/2017 Relation Synonymy (1, R, 2) (2, R 4) (4, R, 8) Etc. (1, R’ 2) (2, R’, 4) (4, R’ 8) Etc. P(R = R’) ~ shared argument pairs Unsupervised probabilistic model O(N log N) algorithm run on millions of docs P(R1 = R2) ~ # of shared argument pairs

136 II. Probability of Correctness
10/30/2017 II Probability of Correctness How likely is an extraction to be correct? Factors to consider include: Authoritativeness of source Confidence in extraction method Number of independent extractions

137 Implicit information, (2+2=4) TextRunner: (Turing, born in, London)
10/30/2017 III Compositional Inference (work in progress, Schoenmackers, Etzioni, Weld) Implicit information, (2+2=4) TextRunner: (Turing, born in, London) WordNet: (London, part of, England) Rule: ‘born in’ is transitive thru ‘part of’ Conclusion: (Turing, born in, England) Mechanism: MLN instantiated on the fly Rules: learned from corpus (future work) Inference Demo 2+2 = 4

138 KnowItAll Family Tree KnowItAll, ‘04 Opine ‘05 BE ‘05 Urns
10/30/2017 KnowItAll Family Tree Mulder ‘ WebKB ‘ PMI-IR ‘01 KnowItAll, ‘04 Opine ‘05 BE ‘05 Urns Woodward ‘06 KnowItNow ‘05 Resolver ‘07 TextRunner ‘07 REALM ‘07 Inference ‘08


Download ppt "Natural Language Processing & Information Extraction Dan Weld"

Similar presentations


Ads by Google