Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.

Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina Tefliuodi, Yafang Wang, Mohamed Yahya, Mauro Sozio, and Fabian Suchanek Max Planck Institute Informatics

French Marriage Problem... marriedTo: person  person marriedTo: person  person marriedTo_French: person  person marriedTo_French: person  person 2  x,y,z: marriedTo(x,y)  marriedTo(x,z)  y=z  x,y,z: marriedTo(x,y)  marriedTo(x,z)  y=z

French Marriage Problem Facts in KB: New facts or fact candidates: marriedTo (Hillary, Bill) marriedTo (Carla, Nicolas) marriedTo (Angelina, Brad) marriedTo (Cecilia, Nicolas) marriedTo (Carla, Benjamin) marriedTo (Carla, Mick) marriedTo (Michelle, Barack) marriedTo (Yoko, John) marriedTo (Kate, Leonardo) marriedTo (Carla, Sofie) marriedTo (Larry, Google) 1)for recall: pattern-based harvesting 2)for precision: consistency reasoning 1)for recall: pattern-based harvesting 2)for precision: consistency reasoning 3  x,y,z: marriedTo(x,y)  marriedTo(x,z)  y=z

Agenda – URDF: Reasoning in Uncertain Knowledge Bases Resolving uncertainty at query-time Lineage of answers Propositional vs. probabilistic reasoning Temporal reasoning extensions – UViz: The URDF Visualization Frontend Demo! 4

URDF: Reasoning in Uncertain KB’s Knowledge harvesting from the Web may yield knowledge bases which are – Incomplete bornIn(Albert_Einstein,?x)  {} – Incorrect bornIn(Albert_Einstein,?x)  {Stuttgart} – Inconsistent bornIn(Albert_Einstein,?x)  {Ulm, Stuttgart} Combine grounding of first-order logic rules with additional step of consistency reasoning – Propositional – Constrained Weighted MaxSat – Probabilistic – Lineage & Possible Worlds Semantics  At query time! 5 [Theobald,Sozio,Suchanek,Nakashole: MPII Tech-Report‘10] 0.7 0.2

Soft Rules vs. Hard Constraints (Soft) Inference Rules vs. (Hard) Consistency Constraints People may live in more than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) People are not born in different places/on different dates bornIn(x,y)  bornIn(x,z)  y=z People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t1)  marriedTo(x,z,t2)  y≠z  disjoint(t1,t2 ) 6 [0.6] [0.2]

Soft Rules vs. Hard Constraints (ct’d) Enforce FD‘s (e.g., mutual exclusion) as hard constraints: Generalize to other forms of constraints: Hard constraint Soft constraint hasAdvisor(x,y)  graduatedInYear(x,t)  graduatedInYear(y,s)  s < t firstPaper(x,p)  firstPaper(y,q)  author(p,x)  author(p,y)  inYear(p) > inYear(q)+5years  hasAdvisor(x,y) [0.6] livesIn(x,y)  type(y,City)  locatedIn(y,z)  type(z,Country)  livesIn(x,z) hasAdvisor(x,y)  hasAdvisor(x,z)  y=z Combine soft and hard constraints No longer regular MaxSat Constrained (weighted) MaxSat instead Combine soft and hard constraints No longer regular MaxSat Constrained (weighted) MaxSat instead 7 Datalog-style grounding (deductive & potentially recursive soft rules) Datalog-style grounding (deductive & potentially recursive soft rules)

Deductive Grounding (SLD Resolution/Datalog) \/ R1 R3 R2 RDF Base Facts F1: marriedTo(Bill, Hillary) F2: represents(Hillary, New_York) F3: governorOf(Bill, Arkansas) RDF Base Facts F1: marriedTo(Bill, Hillary) F2: represents(Hillary, New_York) F3: governorOf(Bill, Arkansas) /\ F1 \/ R2 R3 R1 F2 X X F3 … X X X X Answers (derived facts): livesIn(Bill, Arkansas) livesIn(Bill, New_York) Answers (derived facts): livesIn(Bill, Arkansas) livesIn(Bill, New_York) 8 Query livesIn(Bill, ?x) Query livesIn(Bill, ?x) 8 First-Order Rules (Horn clauses) R1: livesIn(?x, ?y) :- marriedTo(?x, ?z), livesIn(?z, ?y) R2: livesIn(?x, ?y) :- represents(?x, ?y) R3: livesIn(?x, ?y) :- governorOf(?x, ?y) First-Order Rules (Horn clauses) R1: livesIn(?x, ?y) :- marriedTo(?x, ?z), livesIn(?z, ?y) R2: livesIn(?x, ?y) :- represents(?x, ?y) R3: livesIn(?x, ?y) :- governorOf(?x, ?y)

URDF: Reasoning Example Rules hasAdvisor(x,y)  worksAt(y,z)  graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  x=z Rules hasAdvisor(x,y)  worksAt(y,z)  graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  x=z Jeff Stanford University type [1.0] Surajit Princeton David Computer Scientist Computer Scientist worksAt [0.9] type [1.0] graduatedFrom [0.6] graduatedFrom [0.7] graduatedFrom [0.9] hasAdvisor [0.8] hasAdvisor [0.7] 9 KB: Base Facts Derived Facts gradFr(Surajit,Stanford) gradFr(David,Stanford) Derived Facts gradFr(Surajit,Stanford) gradFr(David,Stanford) graduatedFrom [?]

URDF: CNF Construction & MaxSat Solving 10 [Theobald,Sozio,Suchanek,Nakashole: MPII Tech-Report‘10] Query graduatedFrom(?x,?y) Query graduatedFrom(?x,?y) CNF (graduatedFrom(Surajit, Stanford)  graduatedFrom(Surajit, Princeton))  (graduatedFrom(David, Stanford)  graduatedFrom(David, Princeton))  (hasAdvisor(Surajit, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(Surajit, Stanford))  (hasAcademicAdvisor(David, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(David, Stanford))  worksAt(Jeff, Stanford)  hasAdvisor(Surajit, Jeff)  hasAdvisor(David, Jeff)  graduatedFrom(Surajit, Princeton)  graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton)  graduatedFrom(David, Stanford) CNF (graduatedFrom(Surajit, Stanford)  graduatedFrom(Surajit, Princeton))  (graduatedFrom(David, Stanford)  graduatedFrom(David, Princeton))  (hasAdvisor(Surajit, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(Surajit, Stanford))  (hasAcademicAdvisor(David, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(David, Stanford))  worksAt(Jeff, Stanford)  hasAdvisor(Surajit, Jeff)  hasAdvisor(David, Jeff)  graduatedFrom(Surajit, Princeton)  graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton)  graduatedFrom(David, Stanford)  0.4 0.9 0.8 0.7 0.6 0.7 0.9 0.0 1) Deductive Grounding – Yields only facts and rules which are relevant for answering the query (dependency graph D) 2) Boolean Formula in CNF consisting of – Grounded hard rules – Grounded soft rules (weighted) – Base facts (weighted) 3) Propositional Reasoning – Compute truth assignment for all facts in D such that the sum of weights is maximized  Compute “most likely” possible world

URDF: Lineage & Possible Worlds 11 1) Deductive Grounding – Same as before, but trace lineage of query answers 2) Lineage DAG (not CNF!) consisting of – Grounded hard rules – Grounded soft rules – Base facts plus: derivation structure 3) Probabilistic Inference – Marginalization: aggregate probabilities of all possible worlds where the answer is “true” – Drop “impossible worlds”   \/ graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit,?y) Query graduatedFrom(Surajit,?y) 0.7x(1-0.888)=0.078 (1-0.7)x0.888=0.266 1-(1-0.72)x(1-0.6) =0.888 0.8x0.9 =0.72 0.6 0.7 0.9 0.8

Grounding first-order Horn formulas (Datalog) – Decidable – EXPTIME-complete, PSPACE-complete (including recursion, but in P w/o recursion) Max-Sat (Constrained & Weighted) – NP-complete Probabilistic inference in graphical models – #P-complete Grounding first-order Horn formulas (Datalog) – Decidable – EXPTIME-complete, PSPACE-complete (including recursion, but in P w/o recursion) Max-Sat (Constrained & Weighted) – NP-complete Probabilistic inference in graphical models – #P-complete Classes & Complexities 12 FOLOWL OWL-DL/lite Horn

Monte Carlo Simulation (I) 13 [Karp,Luby,Madras: J.Alg.’89] F = X 1 X 2  X 1 X 3  X 2 X 3 cnt = 0 repeat N times randomly choose X 1, X 2, X 3  {0,1} if F(X 1, X 2, X 3 ) = 1 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ cnt = 0 repeat N times randomly choose X 1, X 2, X 3  {0,1} if F(X 1, X 2, X 3 ) = 1 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ Theorem: If N ≥ (1/ Pr(F)) × (4 ln(2/  )/  2 ) then: Pr[ | P/Pr(F) - 1 | >  ] <  Theorem: If N ≥ (1/ Pr(F)) × (4 ln(2/  )/  2 ) then: Pr[ | P/Pr(F) - 1 | >  ] <  May be very big for small Pr(F) May be very big for small Pr(F) X1X2X1X2 X1X3X1X3 X2X3X2X3 Boolean formula: Zero/One-estimator theorem Works for any F (not in PTIME) Works for any F (not in PTIME) Naïve sampling:

Monte Carlo Simulation (II) 14 cnt = 0; S = Pr(C 1 ) + … + Pr(C m ) repeat N times randomly choose i  {1,2,…, m}, with prob. Pr(C i )/S randomly choose X 1, …, X n  {0,1} s.t. C i = 1 if C 1 =0 and C 2 =0 and … and C i-1 = 0 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ cnt = 0; S = Pr(C 1 ) + … + Pr(C m ) repeat N times randomly choose i  {1,2,…, m}, with prob. Pr(C i )/S randomly choose X 1, …, X n  {0,1} s.t. C i = 1 if C 1 =0 and C 2 =0 and … and C i-1 = 0 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ Theorem: If N ≥ (1/m) × (4 ln(2/  )/  2 ) then: Pr[ |P/Pr(F) - 1| >  ] <  Theorem: If N ≥ (1/m) × (4 ln(2/  )/  2 ) then: Pr[ |P/Pr(F) - 1| >  ] <  F = C 1  C 2 ...  C m Improved sampling: Now it’s better Only for F in DNF in PTIME [Karp,Luby,Madras: J.Alg.’89] Boolean formula in DNF:

Learning “Soft” Rules Extend Inductive Logic Programming (ILP) techniques to large and incomplete knowledge bases 15 Software tools: alchemy.cs.washington.edu http://www.doc.ic.ac.uk/~shm/progol.html http://dtai.cs.kuleuven.be/ml/systems/claudien Goal: learn livesIn(?x,?y)  bornIn(?x,?y) Li livesIn(x,y ) bornIn(x,y) livesIn(x,z) Positive Examples livesIn(?x,?y)  bornIn(?x,?y) Negative Examples  livesIn(?x,?y)  bornIn(?x,?y)  livesIn(?x,?z) Li Background knowledge

More Variants of Consistency Reasoning Propositional Reasoning – Constrained Weighted MaxSat solver Lineage & Possible Worlds (independent base facts) – Monte Carlo simulations (Luby-Karp) First-Order Logic & Probabilistic Graphical Models – Markov Logic (currently via interface to Alchemy*) [Richardson & Domingos: ML’06] – Even more general: Factor Graphs [McCallum et al. 2008] – MCMC sampling for probabilistic inference 16 *Alchemy – Open-Source AI: http://alchemy.cs.washington.edu/http://alchemy.cs.washington.edu/

Experiments URDF: SLD grounding & MaxSat solving 17 |C| - # literals in soft rules |S| - # literals in hard rules URDF vs. Markov Logic (MAP inference & MC-SAT) YAGO Knowledge Base: 2 Mio entities, 20 Mio facts Basic query answering: SLD grounding & MaxSat solving of 10 queries over 16 soft rules (partly recursive) & 5 hard rules (bornIn, diedIn, marriedTo, …) Asymptotic runtime checks: runtime comparisons for synthetic soft rule expansions

French Marriage Problem (Revisited) Facts in KB: New fact candidates: marriedTo (Hillary, Bill) marriedTo (Carla, Nicolas) marriedTo (Angelina, Brad) marriedTo (Cecilia, Nicolas) marriedTo (Carla, Benjamin) marriedTo (Carla, Mick) divorced (Madonna, Guy) domPartner (Angelina, Brad) 1: 2: 3: validFrom (2, 2008) validFrom (4, 1996) validUntil (4, 2007) validFrom (5, 2010) validFrom (6, 2006) validFrom (7, 2008) 4: 5: 6: 7: 8: JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC 18

Challenge: Temporal Knowledge Harvesting For all people in Wikipedia (100,000‘s) gather all spouses, incl. divorced & widowed, and corresponding time periods! >95% accuracy, >95% coverage, in one night! 19

Difficult Dating 20

(Even More Difficult) Implicit Dating explicit dates vs. implicit dates relative to other dates explicit dates vs. implicit dates relative to other dates 21

(Even More Difficult) Implicit Dating vague dates relative dates vague dates relative dates narrative text relative order narrative text relative order 22

TARSQI: Extracting Time Annotations Hong Kong is poised to hold the first election in more than half a century that includes a democracy advocate seeking high office in territory controlled by the Chinese government in Beijing. A pro- democracy politician, Alan Leong, announced Wednesday that he had obtained enough nominations to appear on the ballot to become the territory’s next chief executive. But he acknowledged that he had no chance of beating the Beijing-backed incumbent, Donald Tsang, who is seeking re- election. Under electoral rules imposed by Chinese officials, only 796 people on the election committee – the bulk of them with close ties to mainland China – will be allowed to vote in the March 25 election. It will be the first contested election for chief executive since Britain returned Hong Kong to China in 1997. Mr. Tsang, an able administrator who took office during the early stages of a sharp economic upturn in 2005, is popular with the general public. Polls consistently indicate that three-fifths of Hong Kong’s people approve of the job he has been doing. It is of course a foregone conclusion – Donald Tsang will be elected and will hold office for another five years, said Mr. Leong, the former chairman of the Hong Kong Bar Association. [Verhagen et al: ACL‘05] http://www.timeml.org/site/tarsqi/ extraction errors! extraction errors! 23

13 Relations between Time Intervals A Before B B After A A Meets B B MetBy A A Overlaps B B OverlappedBy A A Starts B B StartedBy A A During B B Contains A A Finishes B B FinishedBy A A Equal B AB A B A B A B A B A B A B [Allen, 1984; Allen & Hayes, 1989] 24

0.08 0.12 0.16 Possible Worlds in Time (I) 0.36 0.4 0.6 State Relation ‘03 ‘05‘07 1.0 Base Facts Derived Facts [Wang,Yahya,Theobald: VLDB/MUD Workshop ‘10] 0.2 0.1 0.4 ‘05‘00‘02 0.9 ‘07 State Relation ‘04 ‘03‘04 ‘07 ‘05 25 playsFor(Beckham,Real)playsFor(Ronaldo,Real)  playsFor(Beckham, Real, T1)  playsFor(Ronaldo, Real, T2)  overlaps(T1,T2) teamMates(Beckham, Ronaldo,T3) State

0.06 0.30 0.12 0.2 0.3 0.6 Possible Worlds in Time (II) 0.3 0.5 StateEvent 0.06 Event ‘95 ‘98‘02 ‘96‘99‘00 ‘96‘98 ‘00‘01‘99 0.54 0.91.0 ‘01 0.10.2 ‘98 playsFor(Beckham, United)wonCup(United, ChampionsLeague) Base Facts Derived Facts  Non-independent Independent [Wang,Yahya,Theobald: VLDB/MUD Workshop ‘10] 26 playsFor(Beckham, United, T1)  wonCup(United, ChampionsL,T2)  overlaps(T1,T2) won(Beckham, ChampionsL,T3) Closed and complete representation model (incl. lineage)  Stanford Trio project [Widom: CIDR’05, Benjelloun et al: VLDB’06] Interval computation remains linear in the number of bins Confidence computation per bin is #P-complete  In general requires possible-worlds-based sampling techniques (Luby-Karp, Gibbs sampling, etc.) Need Lineage! Need Lineage! 0.12

Agenda – URDF: Reasoning in Uncertain Knowledge Bases Resolving uncertainty at query-time Lineage of answers Propositional vs. probabilistic reasoning Temporal reasoning extensions – UViz: The URDF Visualization Frontend Demo! 27

UViz: The URDF Visualization Engine UViz System Architecture – Flash client – Tomcat server (JRE) – Relational backend (JDBC) – Remote Method Invocation & Object Serialization (BlazeDS) 28

UViz: The URDF Visualization Engine Demo! 29

Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.

Similar presentations

Presentation on theme: "Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.

Similar presentations

Presentation on theme: "Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina."— Presentation transcript:

Similar presentations

About project

Feedback