Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison.

Similar presentations


Presentation on theme: "Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison."— Presentation transcript:

1 Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison

2 Take-Away Message Inference in SRL Models is very hard!!!! This talk – Presents 3 different yet related inference methods The methods are independent of the underlying formalism They have been applied to different kinds of problems

3 The World is inherently Uncertain Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution FeverAche Influenza Random Variables Direct Influences Propositional Model!

4 Real-World Data (Dramatically Simplified) PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Date Lab Test Result P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months Non- i.i.d Multi- Relational Solution: First-Order Logic / Relational Databases Shared Parameters

5 Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Logic Probabilities Add Probabilities Add Relations Statistical Relational Learning (SRL) Uncertainty in SRL Models is captured by probabilities, weights or potential functions

6 Alphabetic Soup => Endless Possibilities  Web data (web)  Biological data (bio)  Social Network Analysis (soc)  Bibliographic data (cite)  Epidimiological data (epi)  Communication data (comm)  Customer networks (cust)  Collaborative filtering problems (cf)  Trust networks (trust) … Fall 2003– Dietterich @ OSU, Spring 2004 –Page @ UW, Spring 2007-Neville @ Purdue, Fall 2008 – Pedro @ CMU  Probabilistic Relational Models (PRM)  Bayesian Logic Programs (BLP)  PRISM  Stochastic Logic Programs (SLP)  Independent Choice Logic (ICL)  Markov Logic Networks (MLN)  Relational Markov Nets (RMN)  CLP-BN  Relational Bayes Nets (RBN)  Probabilistic Logic Progam (PLP)  ProbLog ….

7 Key Problem - Inference Equivalent to counting 3SAT Models => #P- complete  More pronounced in SRL Models  Prohibitively large number of Objects and Relations  Inference has been the biggest bottleneck for the use of SRL Models in practice

8 Grounding / Propositionalization Difficulty(C,D), Grade(S,C,G) :- Satisfaction(S) 1 student s1, 10 Courses Diff(c1,d1) Diff(c2,d1) Diff(c8,d2) Diff(c3,d2) Diff(c9,d4) Diff(c7,d2) Diff(c4,d4) Diff(c6,d3) Diff(c5,d1) Diff(c10,d2) Grade(s1,c2,A) Grade(s1,c3,B) Grade(s1,c4,A) Grade(s1,c1,B) Grade(s1,c10,A) Grade(s1,c9,A) Grade(s1,c8,A) Grade(s1,c7,A) Grade(s1,c6,B) Grade(s1,c5,A) Satisfaction(S)

9 Realistic Example – Gene-fold Prediction Thanks to Irene Ong

10 Recent Advances in SRL Inference  Preprocessing for Inference  FROG – Shavlik & Natarajan (2009)  Lifted Exact Inference  Lifted Variable Elimination – Poole (2003), Braz et al(2005) Milch et al (2008)  Lifted VE + Aggregation – Kisynski & Poole (2009)  Sampling Methods  MCMC techniques – Milch & Russell (2006)  Logical Particle Filter – Natarajan et al (2008), ZettleMoyer et al (2007)  Lazy Inference – Poon et al (2008)  Approximate Methods  Lifted First-Order Belief Propagation – Singla & Domingos (2008)  Counting Belief Propagation – Kersting et al (2009)  MAP Inference – Riedel (2008)  Bounds Propagation  Anytime Belief Propagation – Braz et al (2009)

11  Fast Reduction of Grounded MLNs  Counting Belief Propagation  Anytime Lifted Belief Propagation  Conclusion

12  Fast Reduction of Grounded MLNs  Counting Belief Propagation  Anytime Lifted Belief Propagation  Conclusion

13 Markov Logic Networks  Weighted logic  Standard approach 1) Assume finite number of constants 2) Create all possible groundings 3) Perform statistical inference (often via sampling) Weight of formula iNo. of true groundings of formula i in x (Richardson & Domingos, MLJ 2005)

14 Counting Satisfied Groundings Typically lots of redundancy in FOL sentences  x, y, z p(x) ⋀ q(x, y, z) ⋀ r(z)  w(x, y, z) If p(John) = false, then formula = true for all Y and Z values

15 e Bi e B1 + … + e Bn Let A =weighted sum of formula satisfied by evidence Let B i =weighted sum of formula in world i not satisfied by evidence Prob(world i ) = e A + Bi e A + B1 + … + e A + Bn Factoring Out the Evidence

16 Take-Away Message - I Efficiently factor out those formula groundings that evidence satisfies Can potentially eliminate the need for approximate inference

17 Worked Example  x, y, z GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x, z) ⋀ SameGroup(y, z)  AdvisedBy(x, y) 10,000People at some school 2000Graduate students 1000Professors 1000TAs 500Pairs of professors in the same group Total Num of Groundings = |x|  |y|  |z| = 10 12 10 12 The Evidence

18 10 12 ¬ GradStudent(P2) ¬ GradStudent(P4) … 2 × 10 11 GradStudent(x) GradStudent(P1) ¬ GradStudent(P2) GradStudent(P3) … True False GradStudent(P1) GradStudent(P3) … 2000 Grad Students 8000 Others All these values for X satisfy the clause, regardless of Y and Z GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) FROG keeps only these X values Instead of 10 4 values for X, have 2 x 10 3

19 2 × 10 11 2 × 10 10 Prof(y) ¬ Prof(P1) Prof(P2) … Prof(P2) … 1000 Professors ¬ Prof(P1) … 9000 Others GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) True False

20 2 × 10 10 2 × 10 9 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) >>

21 2 × 10 9 2 × 10 6 SameGroup(y, z) 10 6 Combinations SameGroup(P1, P2) … 1000 true SameGroup’s ¬ SameGroup(P2, P5) … 10 6 – 1000 Others GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) True False 2000 values of X 1000 Y:Z combinations

22 TA(x, z) 2 × 10 6 Combinations TA(P7,P5) … 1000 TA’s ¬ TA(P8,P4) … 2 × 10 6 – 1000 Others ≤ 10 6 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) True False ≤ 1000 values of X ≤ 1000 Y:Z combinations

23 Original number of groundings = 10 12 10 12 10 6 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) Final number of groundings ≤ 10 6

24 Sample R esults: UWash-CSE FROG’s Reduced Net without One Challenging Rule FROG’s Reduced Net Fully Grounded Net advisedBy(x,y)  advisedBy(x,z)  samePerson(y,z))

25  Fast Reduction of Grounded MLNs  Counting Belief Propagation  Anytime Lifted Belief Propagation  Conclusion

26 Belief Propagation  Message passing algorithm – Inference on graphical models  For factor graphs  Exact – if the factor graph is a tree  Approximate when it has cycles  Loopy BP does not guarantee convergence, but is found to be very useful in practice X3X3 X2X2 X1X1 f1f1 f2f2

27 Belief Propagation Identical Factors

28 Take-Away Message – II Counting shared factors can result in great efficiency gains for (loopy) belief propagation

29 Counting Belief Propagation  Two Steps 1.Compress Factor Graph 2.Run modified BP

30 Step 1: Compression

31 Step 2: Modified Belief Propagation

32 Factored Frontier (FF)  Probabilistic inference over time is central to many AI problems  In contrast to static domains, we need approximation  Variables easily become correlated over time by virtue of sharing common influences in the past  Factored Frontier [Murphy and Weiss 01]  Unroll DBN  Run (loopy) BP Lifted First-Order FF: Use CBP in place of BP

33 Lifted First-order Factored Frontier 20 people over 10 time steps Max number of friends 5 Cancer never observed Time step randomly selected Successor fluent

34  Fast Reduction of Grounded MLNs  Counting Belief Propagation  Anytime Lifted Belief Propagation  Conclusion

35 The Need for Shattering  Lifted BP depends on clusters of variables being symmetric, that is, sending and receiving identical messages  In other words, it is about dividing random variables in cases – called as “shattering”

36 Intuition for Anytime Lifted BP alarm(House) earthquake(Town) in(House, Town) burglary(House) next(House,Another) lives(Another,Neighbor) saw(Neighbor,Someone) masked(Someone) in(House,Item) missing(Item) partOf(Entrance,House) broken(Entrance) Alarm can go off due to an earthquake Alarm can go off due to burglary A “prior” factor makes alarm going off unlikely without those causes

37 Intuition for Anytime Lifted BP Given a home in sf with home2 and home3 next to it with neighbors jim and mary, each seeing person1 and person2, several items in home, including a missing ring and non-missing cash, broken front but not broken back entrances to home, an earthquake in sf, what is the probability that home’s alarm goes off? alarm(House) earthquake(Town) in(House, Town) burglary(House) next(House,Another) lives(Another,Neighbor) saw(Neighbor,Someone) masked(Someone) in(House,Item) missing(Item) partOf(Entrance,House) broken(Entrance)

38 Lifted Belief Propagation alarm(home) burglary(home) earthquake(sf) in(home, sf) partOf(front,home) broken(front) next(home,home2) lives(home2,jim) saw(jim,person1) masked(person1) in(home,ring) missing(ring) partOf(back,home) broken(back) in(home,cash) missing(cash) Item not in { ring,cash,…} in(home,Item) missing(Item) next(home,home3) lives(home2,mary) saw(mary,person2) masked(person2) … … … Complete shattering before belief propagation starts Message passing over entire model before obtaining query answer Model for house ≠ home and town ≠ sf not shown

39 Intuition for Anytime Lifted BP alarm(home) burglary(home) earthquake(sf) in(home, sf) partOf(front,home) broken(front) next(home,home2) lives(home2,jim) saw(jim,person1) masked(person1) in(home,ring) missing(ring) partOf(back,home) broken(back) in(home,cash) missing(cash) Item not in { ring,cash,…} in(home,Item) missing(Item) next(home,home3) lives(home2,mary) saw(mary,person2) masked(person2) … … … Query Evidence Given earthquake, we already have a good lower bound, regardless of burglary branch Wasted shattering!

40 Using only a portion of a model  By using only a portion, we don’t have to shatter other parts of the model  How can we use only a portion?  A solution for propositional models already exists: box propagation (Mooij & Kappen NIPS ‘08)

41 Box Propagation  A way of getting bounds on query without examining entire network. A [0, 1]

42 Box Propagation  A way of getting bounds on query without examining entire network. AB 11 [0, 1][0.36, 0.67]

43 Box Propagation  A way of getting bounds on query without examining entire network. AB 11 [0.05, 0.5][0.38, 0.50] 22... 33 [0.32, 0.4] [0.1, 0.6] [0,1]

44 Box Propagation  A way of getting bounds on query without examining entire network. AB 11 [0.17, 0.3][0.41, 0.44] 22... 33 [0.32, 0.4] [0.3, 0.4] [0.2,0.8] [0,1]

45 Box Propagation  A way of getting bounds on query without examining entire network. AB 11 0.210.42 22... 33 0.36 0.32 0.45 0.3 Convergence after all messages are collected

46 Take-Away Message - III Anytime BP = Incremental Shattering + Box Propagation

47 Anytime Lifted Belief Propagation alarm(home) Start from query alone [0,1] The algorithm works by picking a cluster variable and including the factors in its blanket

48 burglary(home) Anytime Lifted Belief Propagation alarm(home) earthquake(Town) in(home, Town) [0.1, 0.9]  (alarm(home), in(home,Town), earthquake(Town)) after unifying alarm(home) and alarm(House) in  (alarm(House), in(House,Town), earthquake(Town)) producing constraint House = home  (alarm(home), in(home,Town), earthquake(Town)) after unifying alarm(home) and alarm(House) in  (alarm(House), in(House,Town), earthquake(Town)) producing constraint House = home Again, through unification Blanket factors alone can determine a bound on query

49 Anytime Lifted Belief Propagation alarm(home) earthquake(sf) in(home, sf) earthquake(Town) in(home, Town) Town ≠ sf  (in(home, sf)) burglary(home) Cluster in(home, Town) unifies with in(home, sf) in  (in(home, sf)) (which represents evidence) splitting cluster around Town = sf [0.1, 0.9] Bound remains the same because we still haven’t considered evidence on earthquakes

50 Anytime Lifted Belief Propagation alarm(home) earthquake(sf) in(home, sf) earthquake(Town) in(home, Town) Town ≠ sf burglary(home) [0.8, 0.9] No need to further expand (and shatter) other branches If bound is good enough, there is no need to further expand (and shatter) other branches  (earthquake(sf)) represents the evidence that there was an earthquake Now query bound becomes narrow

51 Anytime Lifted Belief Propagation alarm(home) earthquake(sf) in(home, sf) earthquake(Town) in(home, Town) burglary(home) [0.85, 0.9] partOf(front,home) broken(front) Now query bound becomes narrow We can keep expanding at will for narrower bounds… Town ≠ sf

52 Anytime Lifted Belief Propagation alarm(home) burglary(home) earthquake(sf) in(home, sf) partOf(front,home) broken(front) next(home,home2) lives(home2,jim) saw(jim,person1) masked(person1) in(home,ring) missing(ring) partOf(back,home) broken(back) in(home,cash) missing(cash) Item not in { ring,cash,…} in(home,Item) missing(Item) next(home,home3) lives(home2,mary) saw(mary,person2) masked(person2) … … … … until convergence, if desired. 0.8725

53 Connection to Resolution Refutation  Incremental shattering corresponds to building a proof tree alarm(home) earthquake(sf) in(home, sf) earthquake(L), L not in {sf} in(home,L), L not in {sf} burglary(home) true …

54  Fast Reduction of Grounded MLNs  Counting Belief Propagation  Anytime Lifted Belief Propagation  Conclusion

55 Conclusion  Inference is the key issue in several SRL formalisms  FROG - Keeps the count of unsatisfied groundings  Order of Magnitude reduction in number of groundings  Compares favorably to Alchemy in different domains  Counting BP - BP + grouping nodes sending and receiving identical messages  Conceptually easy, scaleable BP algorithm  Applications to challenging AI tasks  Anytime BP – Incremental Shattering + Box Propagation  Only the most necessary fraction of model considered and shattered  Status – Implementation and evaluation

56 Conclusion  Algorithms are independent of representation  Variety of Applications  Parameter Learning of Relational Models  Social Networks  Object Recognition  Link Prediction  Activity Recognition  Model Counting  Bio-Medical Applications  Relational RL

57 Future Work  FROG  Combine with Lifted Inference  Exploit commonality across rules  CBP  Integrate with Parameter Learning in SRL Models  Extend to Multi-Agent RL, Lifted Pairwise BP  Anytime BP  Heuristic to expand the network  Understand closer connections to Resolution  SRL Models  Learning Dynamic SRL Models  Structure Learning remains an open issue

58 Acknowledgements*  Babak Ahmadi - Fraunhofer Institute  Rodrigo de Salvo Braz – SRI International  Hung Bui – SRI International  Vitor Santos Costa – U Porto  Kristian Kersting - Fraunhofer Institute  Gautam Kunapuli – UW Madison  David Page – UW Madison  Stuart Russell – UC Berkeley  Jude Shavlik – UW Madison  Prasad Tadepalli – Oregon State University * Ordered by Last name

59 Thanks!


Download ppt "Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison."

Similar presentations


Ads by Google