Machine Reading at Web Scale Oren Etzioni www.cs.washington.edu/homes/etzioni.

Slides:

Advertisements

Similar presentations

A Latent Dirichlet Allocation Method For Selectional Preferences Alan Ritter Mausam Oren Etzioni 1.

Advertisements

Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.

Learning 5000 Relational Extractors Raphael Hoffmann, Congle Zhang, Daniel S. Weld University of Washington Talk at ACL /12/10.

TEXTRUNNER Turing Center Computer Science and Engineering

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

Machine Reading of Web Text Oren Etzioni Turing Center University of Washington

The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Semi-Supervised, Knowledge-Based Information Extraction for the Semantic Web Thomas L. Packer Funded in part by the National Science Foundation. 1.

Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.

KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Open Information Extraction From The Web Rani Qumsiyeh.

Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun

تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:

1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,

Scalable Text Mining with Sparse Generative Models

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

Natural Language Understanding

ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Open IE and Universal Schema Discovery Heng Ji Acknowledgement: some slides from Daniel Weld and Dan Roth.

Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,

SCALING THE KNOWLEDGE BASE FOR THE NEVER-ENDING LANGUAGE LEARNER (NELL): A STEP TOWARD LARGE-SCALE COMPUTING FOR AUTOMATED LEARNING Joel Welling PSC 4/10/2012.

Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Information Extraction MAS.S60 Catherine Havasi Rob Speer.

Open Information Extraction using Wikipedia

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Structured Querying of Web Text: A Technical Challenge Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko Presenter: Shahina.

Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

Open Information Extraction from the Web Oren Etzioni

1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data.

KnowItAll April William Cohen. Announcements Reminder: project presentations (or progress report) –Sign up for a 30min presentation (or else) –First.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

The Unreasonable Effectiveness of Data

Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

The Road to the Semantic Web Michael Genkin SDBI

A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

KNN & Naïve Bayes Hongning Wang

Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.

Einat Minkov University of Haifa, Israel CL course, U

Social Knowledge Mining

Automatic Detection of Causal Relations for Question Answering

Michal Rosen-Zvi University of California, Irvine

Open Information Extraction from the Web

Extracting Information from Diverse and Noisy Scanned Document Images

KnowItAll and TextRunner

Presentation transcript:

Machine Reading at Web Scale Oren Etzioni

Tech and societal Context Moore’s Law

Text Explosion

Information Overload

5 Paradigm Shift: from retrieval to reading How is the iPad? Found 8,900 reviews; 85% positive. World Wide Web Information Food Chain Key points are… KnowItAll

6 Information Fusion What kills bacteria? What west coast, nano-technology companies are hiring? Compare Obama’s “buzz” versus Hillary’s? What is a quiet, inexpensive, 4-star hotel in Vancouver?

7 Crossing the Structure Chasm

8 Fundamental Hypotheses 1. massive, high-quality KB are invaluable 2.KBs can be learned automatically 3. KBs learned via Machine Reading 4.Reading can leverage the Web corpus 8

99 What is Machine Reading? Self-supervised understanding of text Information extraction + inference

10 Outline I.Motivation (impact: knowledge workers & AI) II.What is Machine Reading? III.Open Information Extraction (IE) IV.Learning Common-sense Knowledge  Argument types via an LDA model V.Future work & Conclusions 10

11 II. A Generic Machine Reader Given: –Corpus of text –Model of language –Hand-labeled training examples? –Ontology? –Human teacher? Output: KB 11

12 Anatomy of a Machine Reader Initialize KB Repeat: 1.Extractor(text, KB)  Tuples (Arg1 predicate Arg2) (Edison invented the light bulb) 2.Integrator(Tuples)  KB 12

13 Extraction Design Decisions What are the atoms? –sentences What syntactic processing? –NP chunking What semantic processing? –Tuple structure Source of training examples? –Existing resources 13

14 IE as Supervised Learning (E.g., Riloff ‘96, Soderland ’99) Find & label examples of each relation Manual labor linear in |relations| Learn relation-specific extractor + S. Smith formerly chairman of XYZ Corporation … ManagementSuccession Person-In Person-Out Organization Position formerly of Labeled Examples RelationExtractor = S. Smith formerly chairman of XYZ Corporation…

15 Semi-Supervised Learning Few hand-labeled examples  Limit on the number of relations  Relations are pre-specified  Limits Macro Reading Alternative: self-supervised learning –Learner discovers relations on the fly (Sekine ’06) –Learner automatically labels examples per relation!

16 III. Open IE = Self-supervised IE (Banko, et. al, IJCAI ’07, ACL ‘08) Traditional IEOpen IE Input: Corpus + Hand- labeled Data Corpus + Existing resources Relations: Specified in Advance Discovered Automatically Complexity: Output: O(D * R) R relations Lexicalized, relation-specific O(D) D documents Relation- independent

17 Integration Design How to represent beliefs & dependencies? –Count tuples (Downey & Etzioni AIJ ‘10) –Infer synonyms (Yates & Etzioni JAIR ’09) How to generate new beliefs? –Learn from extraction set 17

TextRunner Demo Extraction run at Google on 500,000,000 high-quality Web pages.

19

20

21 TextRunner Precision (Banko PhD ’09)

22 How is Open IE Possible? There is a compact set of “relationship expressions” in English “Expressions” are relation-independent (Banko & Etzioni ACL ’08)  (Russell & Norvig, 3 rd Ed.)

23 CategoryPatternFrequency Verb E 1 Verb E 2 X established Y 37.8% Noun+Pre p E 1 NP Prep E 2 the X settlement with Y 22.8% Verb+Prep E 1 Verb Prep E 2 X moved to Y 16.0% Infinitive E 1 to Verb E 2 X to acquire Y 9.4% Modifier E 1 Verb E 2 NP X is Y winner 5.2% Coordinate n E 1 (and|,|-|:) E 2 NP X - Y deal 1.8% Coordinate v E 1 (and|,) E 2 Verb X, Y merge 1.0 Appositive E 1 NP (:|,)? E 2 X hometown : Y 0.8 Relation-Independent Patterns

24 Observations 95% of sample  8 (simplified) patterns! Applicability conditions complex “E1 Verb E2” 1.Kentucky Fried Chicken 2.Microsoft announced Tuesday that… Effective, but far from perfect!

25 Sample of Relations inhibits tumor growth in has a PhD injoined forces with is a person who studies voted in favor ofwon an Oscar for has a maximum speed of died from complications of mastered the art of gained fame as granted political asylum to is the patron saint of was the first person to identified the cause of wrote the book on

26 Number of Relations Yago 92 DBpedia PropBank 3,600 VerbNet 5,000 WikiPedia InfoBoxes, f > 10 ~5,000 TextRunner 100,000+ (estimate) New TextRunner Extractor 1,500,000 (phrases)

27 TextRunner Scalability “Any sentence” property ~100 sentences per second Linear in size of its corpus Rich, “open” vocabulary –100,000+ relations

28 Critique of TextRunner “Textual representation” –Q/A, synonymy, compositional inference Limited semantic model Lower per-sentence precision/recall –But we “make it up on volume!” TextRunner suffers from ADD What next?

29 Chase Wikipedia's long tail? Learn common-sense knowledge not in Wikipedia!

30 IV. Learn Common Sense Knowledge What are the relationships in text? Horn Clauses: Prevents(F,D) :- Contains(F,N) ^ Prevents(N,D) Infer Meta-properties of relations: –Time-dependent? Functional? –Transitive? Symmetric? –Mutually exclusive? –Argument types (“selectional preferences”)

31 Argument Typing Example: P was born in X –P is a person –X is location or date Numerous Applications: Prune incorrect extractions Update probability of inferred assertions Aid syntactic processing: “The couple will meet in Miami, which is located in Florida.”

32 Text  Argument Types? Previous work (Resnick, Pantel, etc.) Utilize generative topic models Topics  Terms  document relation + args = “document”

33 born_in(Einstein,Ulm) headquartered_in(Microsoft,Redmond) founded_in(Microsoft,1973) born_in(Bill Gates,Seattle) founded_in(Google,1998) headquartered_in(Google,Mountain View) born_in(Sergey Brin,Moscow) founded_in(Microsoft, Albuquerque) born_in(Einstein,March) born_in(Sergey Brin,1973) TextRunner ExtractionsRelations as Documents

34 Argument “docs”  Type Models

35 Args can have multiple Types

36 LDA Generative “Story” z a  R  N  T  For each type, pick a random distribution over words Topic 1: Location P(New York|T1)=0.02 P(Moscow|T1)=0.001 … Topic 2: Date P(June|T2)= P(1988|T2)= … For each relation, randomly pick a distribution over types born_in X P(Location| born_in )=0.5 P(Date| born_in )=0.3 … For each extraction, first pick a type Then pick an argument based on type born_in Location born_in New York born_in Date born_in 1988 Prior over Word Distributions Prior over Type Distributions

37 Dependencies between arguments Problem: LDA treats each argument independently Many type pairs co-occur –(Person, Location) –(Politician, Political Issue) Solution: LinkLDA (Erosheva et al. ‘04) –Both args generated by a common  –reduces sparsity and improves generalization

38 z1 a1  R  N  z2 a2  TT  LinkLDA [Erosheva et. al. 2004] z a  R  N  T 

39 z1 a1  R  N  z2 a2  TT  LinkLDA [Erosheva et. al. 2004] Pick a topic for arg2 For each extraction, pick type for a1, a2 Person born_in Location Pick a topic for arg2 Then pick arguments based on types Sergey Brin born_in Moscow For each relation, randomly pick a distribution over types X born_in Y P(Topic1| born_in )=0.5 P(Topic2| born_in )=0.3 … Pick a topic for arg2 Two separate sets of type distributions

40 Type Modeling in LDA-SP Infer distributions and parameters from the data (unsupervised) Sparse priors  relatively few types per relation Collapsed Gibbs Sampling –Easy to implement, linear in corpus size!

41 Repository of Types Associated 600 LinkLDA types to Wordnet –2 hours of manual labor. Compiled arg. types for 10,000 relations precision 0.88 Demo:

44 V. Future Work Machine Reading Machine Reading meets structured data Develop coherent and complete theories! Apply to Web Search

45 Conclusions Machine Reading of the Web is a rich platform for NLP and AI (VLSAI) Related work at: UW, CMU, Stanford, ISI, UIUC, NYU, BBN, SRI, IBM, Cycorp, etc. Our focus is on relation-rich, Web-scale extraction  common sense knowledge