1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.

1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities of all instances D i where t exists. A probabilistic database D p (compactly) encodes a probability distribution over a finite set of deterministic database instances D i. D 1 : 0.42D 2 : 0.18D 3 : 0.28D 4 : 0.12 (1) D p tuple-independent(II) D p block-independent Note: (I) and (II) are not equivalent! Probabilistic Database

Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints PPeople may live in more than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) PPeople are not born in different places/on different dates bornIn(x,y)  bornIn(x,z)  y=z bornOn(x,y)  bornOn(x,z)  y=z PPeople are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t 1 )  marriedTo(x,z,t 2 )  y≠z  disjoint(t 1,t 2 ) [0.8] [0.5]

Basic Types of Inference  MAP Inference  Find the most likely assignment to query variables y under a given evidence x.  Compute: arg max y P( y | x) (NP-complete for MaxSAT)  Marginal/Success Probabilities  Probability that query y is true in a random world under a given evidence x.  Compute: ∑ y P( y | x ) (#P-complete already for conjunctive queries)

[Yahya,Theobald: RuleML’11 Dylla,Miliaraki,Theobald: ICDE’13] Deductive Grounding with Lineage (SLD Resolution in Datalog/Prolog)   \/ /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) CD AB A  (B  (C  D))  A  (B  (C  D)) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 Rules hasAdvisor(x,y)  worksAt(y,z)  graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  y=z Rules hasAdvisor(x,y)  worksAt(y,z)  graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  y=z Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0] Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0]

Lineage & Possible Worlds 1) Deductive Grounding  Dependency graph of the query  Trace lineage of individual query answers 2) Lineage DAG (not in CNF), consisting of  Grounded soft & hard rules  Probabilistic base facts 3) Probabilistic Inference  Compute marginals: P(Q): sum up the probabilities of all possible worlds that entail the query answers’ lineage P(Q|H): drop “impossible worlds”   \/ /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) 0.7x(1-0.888)=0.078(1-0.7)x0.888=0.266 1-(1-0.72)x(1-0.6) =0.888 0.8x0.9 =0.72 CD AB A  (B  (C  D))  A  (B  (C  D)) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 [Das Sarma,Theobald,Widom: ICDE’08 Dylla,Miliaraki,Theobald: ICDE’13]

Possible Worlds Semantics 1.0 0.2664 0.412 P(Q 2 )=0.2664 P(Q 2 |H)=0.2664 / 0.412 = 0.6466 P(Q 1 )=0.0784P(Q 1 |H)=0.0784 / 0.412 = 0.1903 0.0784 Hard rule H:  A   (B  (C  D))

Basic Complexity Issue Theorem [Valiant:1979] For a Boolean expression E, computing P(E) is #P-complete. NP = class of problems of the form “is there a witness ?”  SAT #P = class of problems of the form “how many witnesses ?”  #SAT NP = class of problems of the form “is there a witness ?”  SAT #P = class of problems of the form “how many witnesses ?”  #SAT The decision problem for 2CNF is in PTIME. The counting problem for 2CNF is already #P-complete. [Suciu & Dalvi: SIGMOD’05 Tutorial on "Foundations of Probabilistic Answers to Queries"]

 Tuple-Independent Probabilistic Database  Theorem:  The query answering problem for the above join is #P-hard. A probabilistic database D p (compactly) encodes a probability distribution over a finite set of deterministic database instances D i. Probabilistic Database Which professors work at a university that is located in CA?

Probabilistic & Temporal Database  Sequenced Semantics & Snapshot Reducibility:  Built-in semantics: reduce temporal-relational operators to their non- temporal counterparts at each snapshot (i.e., time point) of the database.  Coalesce/split tuples with consecutive time intervals based on their lineages.  Non-Sequenced Semantics  Queries can freely manipulate timestamps just like regular attributes. Single temporal operator ≤ T supports all of Allen’s 13 temporal relations.  Deduplicate tuples with overlapping time intervals based on their lineages. A temporal-probabilistic database D Tp (compactly) encodes a probability distribution over a finite set of deterministic database instances D i at each time point of a finite time domain  T. [Dignös, Gamper, Böhlen: SIGMOD’12] [Dylla,Miliaraki,Theobald: PVLDB’13

Temporal Alignment & Deduplication Example Non-Sequenced Semantics: f1f1 193619761988 f 2  ¬f 3 f 2  f 3 f 1  ¬f 3 f 1  f 3 (f 1  f 3 )  (f 1  ¬f 3 ) (f 1  f 3 )  (f 1  ¬f 3 )  (f 2  f 3 )  (f 2  ¬f 3 ) (f 1  f 3 )  (f 2  ¬f 3 ) TT marriedTo(x,y)[t b1,T max )  wedding(x,y)[t b1,t e1 )  ¬divorce(x,y)[t b2,t e2 ) marriedTo(x,y)[t b1,t e2 )  wedding(x,y)[t b1,t e1 )  divorce(x,y)[t b2,t e2 )  t e1 ≤ T t b2 Base Facts Derived Facts Dedupl. Facts wedding(DeNiro,Abbott) divorce(DeNiro,Abbott) T max f2f2 f3f3 T min

0.08 0.12 0.16 0.4 0.6 ‘03 ‘05‘07 playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts 0.2 0.1 0.4 ‘05‘00‘02‘07 playsFor(Ronaldo, Real, T 2 ) ‘04 ‘03‘04 ‘07 ‘05 playsFor(Beckham, Real, T 1 )  playsFor(Ronaldo, Real, T 2 )  overlaps(T 1, T 2, T 3 )  t 3 teamMates(Beckham, Ronaldo, t 3 )  teamMates(Beckham, Ronaldo, T 3 ) Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Example using the Allen predicate overlaps

0.4 0.6 ‘03 ‘05‘07 playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) 0.2 0.1 ‘05‘00‘02‘07 ‘04 0.4 0.08 0.12 0.16 ‘03‘04 ‘07 ‘05 playsFor(Zidane, Real, T 3 ) teamMates(Beckham, Zidane, T 5 ) teamMates(Ronaldo, Zidane, T 6 ) teamMates(Beckham, Ronaldo, T 4 ) Non-independent Independent Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13]

playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) playsFor(Zidane, Real, T 3 ) teamMates(Beckham, Zidane, T 5 ) teamMates(Ronaldo, Zidane, T 6 ) Non-independent Independent  Closed and complete representation model (incl. lineage)  Temporal alignment is polyn. in the number of input intervals  Confidence computation per interval remains #P-hard  In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning  Closed and complete representation model (incl. lineage)  Temporal alignment is polyn. in the number of input intervals  Confidence computation per interval remains #P-hard  In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning teamMates(Beckham, Ronaldo, T 4 ) Need Lineage! Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13]

1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.

Similar presentations

Presentation on theme: "1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.

Similar presentations

Presentation on theme: "1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities."— Presentation transcript:

Similar presentations

About project

Feedback