Gerhard Weikum Max Planck Institute for Informatics From Information to Knowledge: Harvesting Entities, Relationships,

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.
Analysis of Algorithms
AIFB Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 1 Mind the Web! Valentin Zacharias, Andreas Abecker, Imen.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Bayesian network for gene regulatory network construction
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Chapter 7 Hypothesis Testing
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010.
Scale Free Networks.
Traditional IR models Jian-Yun Nie.
1 An inference engine for the semantic web Naudts Guido Student at the Open University Netherlands.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.
Research Internships Advanced Research and Modeling Research Group.
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
YAGO-NAGA Project Presented By: Mohammad Dwaikat To: Dr. Yuliya Lierler CSCI 8986 – Fall 2012.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,
Which Nobel laureate survived both world wars and all his four children? Tandem What‘s this? Question AnsweringPhoto AnnotationTimeline Analysis.
Information Retrieval in Practice
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
Real-time population of Knowledge Bases: Opportunities and Challenges Ndapa Nakashole Gerhard Weikum AKBC Workshop at NAACL 2012.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Markov Logic And other SRL Approaches
Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester Dr. Alexander Fraser.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From AAAI 2008 William Pentney, Department of Computer Science & Engineering University of.
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Tutorial: Knowledge Bases for Web Content Analytics
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Einat Minkov University of Haifa, Israel CL course, U
An Introduction to Markov Logic Networks in Knowledge Bases
A Brief Introduction to Distant Supervision
Markov Logic Networks for NLP CSCI-GA.2591
Probabilistic Data Management
Web IR: Recent Trends; Future of Web Search
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
CS246: Information Retrieval
Aiming at prize for brilliant idea the world is not ready for.
Markov Networks.
Presentation transcript:

Gerhard Weikum Max Planck Institute for Informatics From Information to Knowledge: Harvesting Entities, Relationships, and Temporal Facts from Web Sources

Acknowledgements

Goal: Turn Web into Knowledge Base comprehensive DB of human knowledge everything that Wikipedia knows everything machine-readable capturing entities, classes, relationships Source: DB & IR methods for knowledge discovery. Communications of the ACM 52(4), 2009

Approach: Harvesting Facts from Web PoliticianPolitical Party Angela MerkelCDU Karl-Theodor zu GuttenbergCDU Christoph HartmannFDP … CompanyCEO GoogleEric Schmidt YahooOverture FacebookFriendFeed Software AGIDS Scheer … MovieReportedRevenue Avatar$ 2,718,444,933 The Reader$ 108,709,522 FacebookFriendFeed Software AGIDS Scheer … PoliticalPartySpokesperson CDU Philipp Wachholz Die GrünenClaudia Roth FacebookFriendFeed Software AGIDS Scheer … ActorAward Christoph WaltzOscar Sandra BullockOscar Sandra BullockGolden Raspberry … PoliticianPosition Angela MerkelChancellor Germany Karl-Theodor zu GuttenbergMinister of Defense Germany Christoph HartmannMinister of Economy Saarland … CompanyAcquiredCompany GoogleYouTube YahooOverture FacebookFriendFeed Software AGIDS Scheer … YAGO-NAGA IWP Cyc TextRunner ReadTheWeb WikiTax2WordNet SUMO

Knowledge for Intelligence entity recognition & disambiguation understanding natural language & speech knowledge services & reasoning for semantic apps (e.g. deep QA) semantic search: precise answers to advanced queries (by scientists, students, journalists, analysts, etc.) FIFA 2010 finalists who played in a Champions League final? Politicians who are also scientists? Enzymes that inhibit HIV? Influenza drugs for teens with high blood pressure?... German football coach when Bastian Schweinsteiger was born? Relationships between Manfred Pinkal, Edsger Dijkstra, Michael Dell, and Renee Zellweger?

Outline... Automatic KB Construction Growing & Maintaining the KB Temporal Knowledge What and Why Wrap-up

What is Knowledge (in a KB)?... facts / assertions: bornIn (BastianSchweinsteiger, Kolbermoor), hasWon (BastianSchweinsteiger, BronzeFIFAWorldCup2010), playedInFinal (BastianSchweinsteiger, ChampionsLeague2010), … taxonomic : instanceOf (BastianSchweinsteiger, footballPlayer), subclassOf (footballPlayer, athlete), … lexical / terminology: means (Big Apple, NewYorkCity), means (Apple, AppleComputerCorporation) means (MS, Microsoft), means (MS, MultipleSclerosis) … common-sense properties: apples are green, red, juicy, sweet, sour … - but not fast, smart … balls are round, smooth, slippery … - but not square, funny … common-sense axioms: x: human(x) male(x) female(x) x: (male(x) female(x)) (female(x) ) male(x)) x: animal(x) (hasLegs(x) isEven(numberOfLegs(x)) … procedural: how to fix/install/prepare/remove … epistemic / beliefs: believes (Ptolemy, shape(Earth, disc)), believes (Copernicus, shape(Earth, sphere)) …

Tapping on Wikipedia Categories

KBs: Example YAGO (Suchanek et al.: WWW07) Entity Max_Planck Apr 23, 1858 Person City Country subclass Location subclass instanceOf subclass bornOn Max Planck means (0.9) subclass Oct 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck FatherOf hasWon Scientist means Max Karl Ernst Ludwig Planck Physicist instanceOf subclass Biologist subclass Germany Politician Angela Merkel Schleswig- Holstein State Angela Dorothea Merkel Oct 23, 1944 diedOn Organization subclass Max_Planck Society instanceOf means(0.1) instanceOf subclass means Angela Merkel means citizenOf instanceOf locatedIn subclass Accuracy 95% 2 Mio. entities, classes 40 Mio. RDF triples (facts) ( entity1-relation-entity2, subject-predicate-object )

KBs: Example YAGO (F. Suchanek et al.: WWW07)

KBs: Example DBpedia (Auer, Bizer, et al.: ISWC07) 3 Mio. entities, 1 Bio. facts (RDF triples) 1.5 Mio. entities mapped to hand-crafted taxonomy of 259 classes with 1200 properties

Outline... Automatic KB Construction Growing & Maintaining the KB Temporal Knowledge What and Why Wrap-up

French Marriage Problem facts in KB: new facts or fact candidates: married (Hillary, Bill) married (Carla, Nicolas) married (Angelina, Brad) married (Cecilia, Nicolas) married (Carla, Benjamin) married (Carla, Mick) married (Michelle, Barack) married (Yoko, John) married (Kate, Leonardo) married (Carla, Sofie) married (Larry, Google) 1)for recall: pattern-based harvesting 2)for precision: consistency reasoning

Pattern-Based Harvesting FactsPatterns (Hillary, Bill) (Carla, Nicolas) & Fact Candidates X and her husband Y X and Y on their honeymoon X and Y and their children X has been dating with Y X loves Y … good for recall noisy, drifting not robust enough for high precision (Angelina, Brad) (Hillary, Bill) (Victoria, David) (Carla, Nicolas) (Angelina, Brad) (Yoko, John) (Carla, Benjamin) (Larry, Google) (Kate, Pete) (Victoria, David) (Hearst 92, Brin 98, Agichtein 00, Etzioni 04, …)

Reasoning about Fact Candidates Use consistency constraints to prune false candidates spouse(Hillary,Bill) spouse(Carla,Nicolas) spouse(Cecilia,Nicolas) spouse(Carla,Ben) spouse(Carla,Mick) spouse(Carla, Sofie) spouse(x,y) diff(y,z) spouse(x,z) f(Hillary) f(Carla) f(Cecilia) f(Sofie) m(Bill) m(Nicolas) m(Ben) m(Mick) spouse(x,y) f(x)spouse(x,y) m(y) spouse(x,y) (f(x) m(y)) (m(x) f(y)) FOL rules (restricted): ground atoms: Rules can be weighted (e.g. by fraction of ground atoms that satisfy a rule) uncertain / probabilistic data compute prob. distr. of subset of atoms being the truth Rules reveal inconsistencies Find consistent subset(s) of atoms (possible world(s), the truth) spouse(x,y) diff(w,x) spouse(w,y)

Markov Logic Networks (MLNs) (M. Richardson / P. Domingos 2006) Map logical constraints & fact candidates into probabilistic graph model: Markov Random Field (MRF) s(x,y) m(y) s(x,y) diff(y,z) s(x,z) s(Carla,Nicolas) s(Cecilia,Nicolas) s(Carla,Ben) s(Carla,Sofie) … s(x,y) diff(w,y) s(w,y) s(x,y) f(x) s(Ca,Nic) s(Ce,Nic) s(Ca,Nic) s(Ca,Ben) s(Ca,Nic) s(Ca,So) s(Ca,Ben) s(Ca,So) s(Ca,Nic) m(Nic) Grounding: s(Ce,Nic) m(Nic) s(Ca,Ben) m(Ben) s(Ca,So) m(So) f(x) m(x) m(x) f(x) Literal Boolean Var Literal binary RV

Markov Logic Networks (MLNs) (M. Richardson / P. Domingos 2006) Map logical constraints & fact candidates into probabilistic graph model: Markov Random Field (MRF) s(x,y) m(y) s(x,y) diff(y,z) s(x,z) s(Carla,Nicolas) s(Cecilia,Nicolas) s(Carla,Ben) s(Carla,Sofie) … s(x,y) diff(w,y) s(w,y) s(x,y) f(x)f(x) m(x) m(x) f(x) m(Ben) m(Nic) s(Ca,Nic) s(Ce,Nic) s(Ca,Ben) s(Ca,So) m(So) RVs coupled by MRF edge if they appear in same clause MRF assumption: P[X i |X 1..X n ]=P[X i |N(X i )] Variety of algorithms for joint inference: Gibbs sampling, other MCMC, belief propagation, randomized MaxSat, … joint distribution has product form over all cliques

Related Alternative Probabilistic Models software tools: alchemy.cs.washington.edu alchemy.cs.washington.edu code.google.com/p/factorie/ research.microsoft.com/en-us/um/cambridge/projects/infernet/ Constrained Conditional Models [D. Roth et al. 2007] Factor Graphs with Imperative Variable Coordination [A. McCallum et al. 2008] log-linear classifiers with constraint-violation penalty mapped into Integer Linear Programs RVs share factors (joint feature functions) generalizes MRF, BN, CRF, … inference via advanced MCMC flexible coupling & constraining of RVs m(Ben) m(Nic) s(Ca,Nic) s(Ce,Nic) s(Ca,Ben) s(Ca,So) m(So)

Reasoning for KB Growth: Direct Route facts in KB: new fact candidates: married (Hillary, Bill) married (Carla, Nicolas) married (Angelina, Brad) married (Cecilia, Nicolas) married (Carla, Benjamin) married (Carla, Mick) married (Carla, Sofie) married (Larry, Google) + patterns: X and her husband Y X and Y and their children X has been dating with Y X loves Y ? 1.facts are true; fact candidates & patterns hypotheses grounded constraints clauses with hypotheses as vars 2.type signatures of relations greatly reduce #clauses 3.cast into Weighted Max-Sat with weights from pattern stats customized approximation algorithm unifies: fact cand consistency, pattern goodness, entity disambig. (F. Suchanek et al.: WWW09) Direct approach:

Facts & Patterns Consistency with SOFIE constraints to connect facts, fact candidates, patterns (F. Suchanek et al.: WWW09, N. Nakashole et al.: WebDB10) functional dependencies: spouse(X,Y): X Y, Y X relation properties: asymmetry, transitivity, acyclicity, … type constraints, inclusion dependencies: spouse Person PersoncapitalOfCountry cityOfCountry domain-specific constraints: bornInYear(x) + 10years graduatedInYear(x) hasAdvisor(x,y) graduatedInYear(x,t) graduatedInYear(y,s) s < t pattern-fact duality: occurs(p,x,y) expresses(p,R) type(x)=dom(R) type(y)=rng(R) R(x,y) name(-in-context)-to-entity mapping: means(n,e1) means(n,e2) … occurs(p,x,y) R(x,y) type(x)=dom(R) type(y)=rng(R) expresses(p,R)

Entity Disambiguation Revisited occurs (divorced from, Madonna, Guy Ritchie) expresses (divorced from, wasMarriedTo) wasMarriedTo (Madonna, Guy Ritchie) actually is: occurs (divorced from, Madonna, Guy Ritchie) means (Madonna, Madonna Louise Ciccone ) expresses (divorced from, wasMarriedTo) wasMarriedTo (Madonna Louise Ciccone, Guy Ritchie)[0.7] occurs (divorced from, Madonna, Guy Ritchie) means (Madonna, Madonna (Edvard Munch)) expresses (divorced from, wasMarriedTo) wasMarriedTo (Madonna (Edvard Munch), Guy Ritchie) [0.3] use context-similarity as disambiguation prior set clause weights accordingly reduced to normal case entity level word/phrase level

Experimental Results SOFIE (F. Suchanek et al.: WWW09) input: biographies of 400 US senators, 3500 HTML files output: birth/death date&place, politicianOf (state) run-time: 7 h parsing, 6 h hypotheses, 2 h Max-Sat precision: % (except for death place) recall: ca. 750 extracted facts (300 politicianOf facts) PROSPERA (N. Nakashole et al.: WebDB10): input: Wikipedia articles and Web homepages of scientists output: hasAdvisor, graduatedAt, hasCollaborator, facultyAt, wonAward run-time: 1 h total (largely parallelized) precision: % recall: ca extracted facts (400 hasAdvisor facts) Now running experiments on ClueWeb09 corpus (500 Mio. English Web pages) with Hadoop cluster of 10x16 cores and 10x48 GB

Outline... Automatic KB Construction Growing & Maintaining the KB Temporal Knowledge What and Why Wrap-up

Temporal Knowledge Which facts for given relations hold at what time point or during which time intervals ? marriedTo (Madonna, Guy) [ 22Dec2000, Dec2008 ] capitalOf (Berlin, Germany) [ 1990, now ] capitalOf (Bonn, Germany) [ 1949, 1989 ] hasWonPrize (JimGray, TuringAward) [ 1998 ] graduatedAt (HectorGarcia-Molina, Stanford) [ 1979 ] graduatedAt (SusanDavidson, Princeton) [ Oct 1982 ] hasAdvisor (SusanDavidson, HectorGarcia-Molina) [ Oct 1982, forever ] How can we query & reason on entity-relationship facts in a time-travel manner - with uncertain/incomplete KB ? US president when Barack Obama was born? students of Hector Garcia-Molina while he was at Princeton?

French Marriage Problem facts in KB new fact candidates: married (Hillary, Bill) married (Carla, Nicolas) married (Angelina, Brad) married (Cecilia, Nicolas) married (Carla, Benjamin) married (Carla, Mick) divorced (Madonna, Guy) domPartner (Angelina, Brad) 1: 2: 3: validFrom (2, 2008) validFrom (4, 1996) validUntil (4, 2007) validFrom (5, 2010) validFrom (6, 2006) validFrom (7, 2008) 4: 5: 6: 7: 8: JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

Challenge: Temporal Knowledge for all people in Wikipedia ( ) gather all spouses, incl. divorced & widowed, and corresponding time periods! >95% accuracy, >95% coverage, in one night consistency constraints are potentially helpful: functional dependencies: husband, time wife inclusion dependencies: marriedPerson adultPerson age/time/gender restrictions: birthdate + < marriage < divorce 1)recall: gather temporal scopes for base facts 2)precision: reason on mutual consistency

Difficult Dating

(Even More Difficult) Implicit Dating explicit dates vs. implicit dates relative to other dates

(Even More Difficult) Relative Dating vague dates relative dates vague dates relative dates narrative text relative order narrative text relative order

Framework for T-Fact Extraction (Theobald et al.: MUD10, Wang et al.: EDBT10; Zhang et al.: WebDB08) 1)represent temporal scopes of facts in the presence of incompleteness and uncertainty 2) gather & filter candidates for t-facts: extract base facts R(e1, e2) first; then focus on sentences with e1, e2 and date or temporal phrase 3)aggregate & reconcile evidence from observations 4) reason on joint constraints about facts and time scopes

1) Representing T-Fact Evidence different resolutions, later refinement uncertain & inconsistent evidence confidence distribution After 4 years of happy marriage, Madonna and Sean got divorced in September : married(Madonna, Sean), earliestSince (1, 1-Jan-1985), latestSince (1, 31-Dec-1985), earliestUntil (1, 1-Sep-1989), latestUntil (1, 30-Sep-1989) event-style and state-style facts meta-facts to capture temporal scopes 1: married(Madonna, Sean), 2: married(Madonna, Guy), validSince (1, 16-Aug-1985), validUntil (1, 14-Sep-1989), validSince (2, 22-Dec-2000), validUntil (2, 15-Dec-2008) 3: wonAward(Sean, AcademyAwardForBestActor) validOn (3, 29-Feb-2004) µ=1987 σ 2 =

2) Gather & Filter T-Fact Candidates Choice of sources: news-stylebiography-style date in headermany dates in text relative temp exprsexplicit dates, narrative simple languageelaborated language many pronounspronouns for main entity Naive approach: use deep NLP (dependency parser) on every sentence then use classifier (or structured-output learner) to detect t-facts too expensive Bruni met recently divorced president Sarkozy in November 2007 at a dinner party. She has said she is easily "bored with monogamy … A romance is said to have started a few weeks ago between her and Biolay.

2) Gather & Filter: Multi-Stage Approach stage 1: sentences with e1 and e2 from R stage 2: sentences that contain a temporal expression stage 3: sentences where the t-expression refers to R(e1,e2) match noun phrases against YAGO means relation use disambiguation prior for entity mentions use TARSQI tool to extract relative t-expressions and map them to absolute dates or durations run dependency parser: check shortest path connecting e1, e2, verb, t-expr alternatively, consider only sentences with two noun groups & short surface distances of e1, e2, t-expr Jim married Sue, but later left her and began an affair with Jane in 2005.

3) Aggregate & Reconcile T-Fact Evidence Ideal input: Madonna and Sean were married from 16-Aug-85 until 12-Sep-89. Madonna and Sean married on August 16, Madonna and Sean got divorced in September time evidence Imprecise input: Madonna and Sean were married from 1985 through Madonna and Sean were married four years in the late nineties. Madonna and Sean got divorced in fall Noisy input: Madonna and Sean plan their wedding in summer Madonna and Sean just returned from their honeymoon (in Jan 1986). Madonna and Sean will be divorced by the the end of the year (1989). The marriage of Madonna and Sean will not survive this year (1987).

3) Aggregate & Reconcile T-Fact Evidence Real input: … Madonna and Sean were chased during their honeymoon … (Jan 19, 1986) Madonna and her husband Sean opened the exhibition … (March 7, 1986) Madonna and her husband Sean were seen at … (April 1, 1986) Madonna and Sean met other couples at … (June 22, 1986) Madonna and Sean plan to have children … (July 4, 1986) Madonna and Sean would consider adopting a child … (July 14, 1986) Sean and his wife Madonna purchase another castle in … (November 5, 1986)... Madonna and Sean think about getting divorced … (April 21, 1989) The marriage of Madonna and Sean is in deep crisis … (May 11, 1989) … time evidence

3) Aggregate & Reconcile T-Fact Evidence Real input: … Madonna and Sean were chased during their honeymoon … (Jan 19, 1986) Madonna and her husband Sean opened the exhibition … (March 7, 1986) Madonna and her husband Sean were seen at … (April 1, 1986) Madonna and Sean met other couples at … (June 22, 1986) Madonna and Sean plan to have children … (July 4, 1986) Madonna and Sean would consider adopting a child … (July 14, 1986) Sean and his wife Madonna purchase another castle in … (November 5, 1986)... Madonna and Sean think about getting divorced … (April 21, 1989) The marriage of Madonna and Sean is in deep crisis … (May 11, 1989) … time evidence …..……..…

3) Aggregate & Reconcile: Solution time evidence event histogram (begin) event histogram (end) state histogram (during) Classifer for t-fact observations: begin vs. during vs. end Build separate histogram for each class (and each t-fact) Combine histograms & derive high-confidence time scope

4) Joint Reasoning on Facts and T-Facts X, Y, Z, T1, T2: m(X,Y) m(X,Z) validTime(m(X,Y),T1) validTime(m(X,Z),T2) overlaps(T1, T2) constraint: marriedTo (m) is an injective function at any given point Combine & reconcile t-scopes across different facts after grounding: m(Carla, Nicolas) m(Cecilia, Nicolas) overlaps ([2008,2010], [1996,2007]) m(Carla, Nicolas) m(Carla, Benjamin) overlaps ([2008,2010], [2009,2011]) m(Ca,Nic) m(Ce,Nic) false m(Ca,Nic) m(Ca,Ben) true

4) Joint Reasoning on Facts and T-Facts time m(Ca, Ben) m(Ca, Nic) m(Ce, Nic) m(Ca, Mi) m(Ce, Mi) Conflict graph: m(Ca, Ben) [2009,2011] m(Ca, Nic) [2008,2010] m(Ce, Nic) [1996,2007] m(Ca, Mi) [2004,2008] m(Ce, Mi) [1998,2005] Find maximal independent set: subset of nodes w/o adjacent pairs with (evidence-) weighted nodes

4) Joint Reasoning on Facts and T-Facts time m(Ca, Ben) m(Ca, Nic) m(Ce, Nic) m(Ca, Mi) m(Ce, Mi) Conflict graph: m(Ca, Ben) [2009,2011] m(Ca, Nic) [2008,2010] m(Ce, Nic) [1996,2007] m(Ca, Mi) [2004,2008] m(Ce, Mi) [1998,2005] Find maximal independent set: subset of nodes w/o adjacent pairs with (evidence-) weighted nodes

4) Joint Reasoning on Facts and T-Facts time m(Ca, Ben) m(Ca, Nic) m(Ce, Nic) m(Ca, Mi) m(Ce, Mi) alternative approach: split t-scopes and reason on consistency of t-fact partitions

Preliminary Results overlaps (T1,T2) teammates(X,Y) automatic extraction of t-facts about football/soccer from Wikipedia and news articles query answering by reasoning on t-facts

Outline... Automatic KB Construction Growing & Maintaining the KB Temporal Knowledge What and Why Wrap-up

KB Building: Where Do We Stand? Knowledge Bases on Entities & Classes Relationships Temporal Knowledge widely open (fertile) research ground: uncertain / incomplete temporal scopes of facts joint reasoning on base-facts and time-scopes good progress, but many challenges left: recall & precision by patterns & reasoning efficiency & scalability soft rules, hard constraints, richer logics, … open-domain discovery of new relation types strong success story, some problems left: large taxonomies of classes with individual entities long tail calls for new methods entity disambiguation remains grand challenge

Overall Take-Home... Historic opportunity: revive Cyc vision, make it real & large-scale ! KB as enabler of macroscopic machine reading challenging & risky, but high pay-off Explore & exploit synergies between semantic, statistical, & social Web methods: statistical evidence + logical consistency ! Many interesting research topics for CS (+ CoLi): efficiency & scalability constraints & reasoning on uncertain data NLP for temporal statements statistical ranking for semantic search knowledge-base life-cycle: growth & maintenance

Thank You !