Download presentation

Presentation is loading. Please wait.

Published byOlivia Hemp Modified over 2 years ago

1
Compiling Graphical Models Adnan Darwiche University of California, Los Angeles UAI06 Tutorial

2
Compilation: Historical Motivation Separate inference into two phases: Offline: Compile model into a structure Online: Use structure to answer queries Goal: Push as much work into offline phase to optimize online inference time Best initial example: Offline: Compile a Bayesian network into a jointree Online: Use jointree to answer multiple queries efficiently

3
Compilation: Modern Motivation Exploit model structure in inference: Global structure: Exhibited in model topology Measured by treewidth Exploited by most (non-compilation) algorithms Local structure: Exhibited in model parameters Type 1: Determinism Type 2: Context-specific independence Local structure is best exploited in the context of compilation: main theme

4
Compilation: Theoretical Implications Unifies inference paradigms Variable elimination Jointree (Tree clustering) Conditioning Compilation as a trace of classical inference

5
Bayesian Networks Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start Local Knowledge

6
Bayesian Networks Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start ON OFF OK WEAK DEAD Lights Battery Power If Battery Power = OK, then Lights = ON (99%) ….

7
Bayesian Networks Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start

8
Global Structure: Treewidth w

9
Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start Local Structure: CSI and Determinism

10
Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start Context Specific Independence (CSI) Local Structure: CSI and Determinism

11
Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start ON OFF OK WEAK DEAD Lights Battery Power If Battery Power = Dead, then Lights = OFF Determinism

12
Todays Models … Characterized by: Richness in local structure (determinism, CSI) Massiveness in size (100,000s variables not uncommon) High connectivity (treewidth > 50, > 100) Enabled by: High level modeling tools: relational, first order New application areas (synthesis): Bioinformatics (e.g. linkage analysis) Sensor networks Exploiting local structure a must!

13
High Order Specifications: Relational Models… burglary(v)=0.005; alarm(v)=(burglary(v):0.95,0.01); calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0); alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)} burglary(v)=0.005; alarm(v)=(burglary(v):0.95,0.01); calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0); alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)} Primula

14

15
Friends and Smokers ( Richardson & Domingos, 2004) M individuals Relations such as smokes(p), cancer(p), friend(p 1,p 2 ) Logical constraints such as: if one of p's friends smokes, then p smokes. Sample Query: probability that given person has cancer

16
Students (Pasula & Russell, 2001) P professors S students Various relations, such as famous(p), well- funded(p), success(s), advises(p,s) Sample Query: probability a professor is well-funded given success of advised students Online Time (sec) 3336,450,23164,32517, , ,936,50433,3539, , ,236,25738,88910, , ,531,23020,2795, , ,46121,1155, , ,41011,0993, , Offline Time (min) AC Edges Cnf Clauses CNF Vars Treewidth w Network params Students- Profs

17
Ordering genes on a chromosome and determining distance between them Useful for predicting and detecting diseases Associating functionality of genes with their location on the chromosome Gene 1 Gene 2 Gene 3 Genetic Linkage Analysis

18
Pedigrees + Phenotype + Genotype

19

20
DBNs from Speech Applications

21
Coding Networks

22
Tutorial Outline Theoretical foundations Online query answering algorithms Offline compilation algorithms Applications Concluding remarks

23
Theoretical Foundations Graphical Model (Bayesian, Markov Networks): Is a Multi-Linear Function (MLF) Compiled Model: Is an Arithmetic Circuit (AC) Compilation process: Factoring MLF into AC

24
Multi-Linear Functions Arithmetic Circuits AB * * ** + ++ * ** * Factoring A Differential Approach to Inference in Bayesian Networks JACM-03 (Darwiche)

25
Factoring Multi-linear Functions (MLFs) a + ad + abd + abcd MLF: * + * abdc1 + Arithmetic Circuit (AC) An MLF has an exponential number of terms, yet it may be represented by an AC with polynomial size! A graphical model defines an MLF Evaluating the MLF for a given evidence gives the probability of evidence The inference problem can be formulated as factoring the MLF of a graphical model Circuit Complexity: Size of smallest AC that computes the MLF

26
Pr(a) = =.3 false B A true false Pr(.) false true Graphical Models as MLFs

27
Pr(~b) = =.41 false B A true false true Pr(.) Graphical Models as MLFs

28
.03λ a λ b +.27λ a λ ~b +.56λ ~a λ b +.14λ ~a λ ~b false B.03 A true false true λ a* λ b *.03 λ a* λ ~ b *.27 λ ~ a* λ b *.56 λ ~ a* λ ~ b*.14 F (λ ~a, λ ~b, λ a, λ b ) = Pr(.) Graphical Models as MLFs

29
=.03λ a λ b +.27λ a λ ~b +.56λ ~a λ b +.14λ ~a λ ~b F (λ ~a, λ ~b, λ a, λ b ) Pr(a,~b) = F (λ ~a :0, λ ~b :1, λ a :1, λ b :0) =.27 Pr(a) = F (λ ~a :0, λ ~b :1, λ a :1, λ b :1) =

30
A B C θ b|a θaθa θ c|a ABCPr(.) abcθ a θ b|a θ c|a ab~cθ a θ b|a θ ~c|a a~bcθ a θ ~b|a θ c|a a~b~cθ a θ ~b|a θ ~c|a...…

31
A B C θ b|a θaθa θ c|a ABCPr(.) abc λ a λ b λ c θ a θ b|a θ c|a ab~c λ a λ b λ ~c θ a θ b|a θ ~c|a a~bc λ a λ ~b λ c θ a θ ~b|a θ c|a a~b~c λ a λ ~b λ ~c θ a θ ~b|a θ ~c|a...…

32
F = λ a λ b λ c θ a θ b|a θ c|a + λ a λ b λ ~c θ a θ b|a θ ~c|a + λ a λ ~b λ c θ a θ ~b|a θ c|a + λ a λ ~b λ ~c θ a θ ~b|a θ ~c|a …. A B C

33
F = λ a λ b λ c λ d θ a θ b|a θ c|a θ d|bc + λ a λ b λ c λ ~d θ a θ b|a θ c|a θ ~d|bc + …. A B C D Each term has 2n variables (n indicators, n parameters) Each variable has degree one (multi-linear function) θaθa θ b|a θ c|a θ d|bc

34
Multi-Linear Functions Arithmetic Circuits AB * * ** + ++ * ** * Factoring

35
Online Query Answering Complexity: Time and space linear in the AC size Queries: Probability of evidence, with evidence flipping/fast retraction Variable and family marginals MPE: most probable explanation Sensitivity analysis (derivatives)

36
Evaluating the Polynomial

37
PR: Probability of Evidence Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start Pr(e)

38
The Partial Derivatives

39
PR: Probability of Evidence Flips Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start Pr(e) X

40
PR: Probability of Evidence Flips Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start Pr(e-X,x) X

41
The Partial Derivatives

42
PR: Family Marginals Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine Turn Over Gas Gauge Gas Fuel Pump Fuel Line Distributor Spark Plugs Engine Start U X Pr(e,x,u)

43
Multi-Linear Functions Arithmetic Circuits AB * * ** + ++ * ** * Factoring

44
* * * * * * * * Circuit Evaluation and Differentiation: Marginals Two passes only: probability of evidence (with evidence flipping) Node marginals Family marginals Sensitivity

45
Efficient Eval/Diff Schemes Assume alternating levels of +/* nodes, with one parent per *node Method A: Two registers per +node (no registers for *nodes) Method B: One register per node (use for values in upward pass, then override with derivatives in downward pass) Method C: One register per node, one bit per *node

46
** ** m mm ** * * Circuit Optimization: MPE * **

47
m * m **

48
Custom Hardware for Evaluating ACs Adharapurapu, Ercegovac (2004)

49
Offline Compilation Factoring MLFs into ACs: Jointree: Embeds AC Variable Elimination: Trace is an AC Recursive Conditioning: Trace is an AC Reduction to Logic: CNF to d-DNNF compilation

50
Compiling using Jointrees Classical Jointree Algorithm: Convert model into jointree Jointree propagation (two-passes) Modern interpretation: Jointree embeds an AC that factors MLF Jointree propagation is evaluating/differentiating embedded AC

51
AB A AB root A Jointree Embeds an AC… ACAD AE AB A AB Inward-pass evaluates circuit Outward-pass differentiates circuit [Hugin, Shenoy Shafer,…] A Differential Semantics to Jointree Algorithms AIJ-04 (with James Park)

52
Efficient Eval/Diff Schemes Assume alternating levels of +/* nodes, with one parent per *node Method A: Two registers per +node (no registers for *nodes) Method B: One register per node (use for values in upward pass, then override with derivatives in downward pass) Method C: One register per node, one bit per *node

53
Jointree Flavors Shenoy-Shafer: Method A Hugin: Method B (looses information) Zero-Conscious Hugin (new): Method C (best of A,B)

54
Compiling using Variable Elimination (VE) VE operates on factors: Mappings from variable instantiations to real numbers VE performs two operations on factors: Multiply two factors Sum-Out a variable from factor Factors have different representations: Tables More structured representations (decision trees/graphs) Overhead problem for structured factors

55
A B true false A.3.7 TATA Tabular Factors false B.1.9 A.8.2 true false TBTB true

56
X Z.1.9 Y.5 Z Structured Factors: Algebraic Decision Diagrams (ADDs)

57
NetworkMax ClustVarsCardTotal Parms%Det%Distinct alarm bm diabetes hailfinder mildew mm munin munin munin munin pathfinder pigs students tcc4f water Networks with Local Structure

58
VE: Tabular vs ADD Representations of Factors TabularADD NetworkTime (ms) Improvement alarm barley30714, bm-5-34, diabetes94933, hailfinder link1,6882, mm , mildew7292, munin11551, munin22043, munin33505, munin44064, pathfinder515, pigs st tcc4f water761,

59
Compiling using Variable Elimination (VE) By using symbolic factors and corresponding operations: VE compiles out an AC VE with tabular factors: Generates ACs similar to those embedded in jointree VE with structured factors: Generates much smaller ACs Overhead pushed into offline phase

60
A B true false A.3.7 TATA Factors false B.1.9 A.8.2 true false TBTB true

61
A B false ATATA θ a * λ a θ ~a * λ ~a false BA true false TBTB true θ ~b|a * λ ~b θ b|~a * λ b θ b|a * λ b θ ~b|~a * λ ~b Symbolic Factors

62
true false ATBTB θ b|a * λ b + θ ~b|a * λ ~b θ b|~a * λ b + θ ~b|~a * λ ~b false BA true false TBTB true θ ~b|a * λ ~b θ b|~a * λ b θ b|a * λ b θ ~b|~a * λ ~b Summing out B Summing out Variable B

63
*= Multiplying Factors true false AT A T B θ a * λ a *( θ b|a * λ b + θ ~b|a * λ ~b ) θ ~ a * λ ~ a *( θ b|~a * λ b + θ ~b|~a * λ ~b ) true false ATBTB θ b|a * λ b + θ ~b|a * λ ~b θ b|~a * λ b + θ ~b|~a * λ ~b true false A TATA θ a *λ a θ ~a *λ ~a

64
θ a * λ a * ( θ b|a * λ b + θ ~b|a * λ ~b ) + θ ~ a * λ ~ a ( θ b|~a * λ b + θ ~b|~a * λ ~b ) true false AT A T B θ a * λ a * ( θ b|a * λ b + θ ~b|a * λ ~b ) θ ~ a * λ ~ a * ( θ b|~a * λ b + θ ~b|~a * λ ~b ) Summing out Variable A

65
VE factors MLF into AC (Bottom up Construction) AB * * ** + ++ * ** * Factoring Time and space complexity of generating AC is similar to Variable Elimination: Exponential only in treewidth Generated ACs similar to those embedded in Jointree Recall: AC can be used to answer multiple queries!

66
X Z.1.9 Y.5 Z Structured Factors: Algebraic Decision Diagrams (ADDs)

67
X ZY Z Symbolic ADD Modify standard ADD operations (multiply, sum-out) to operate on symbolic ADDs Run variable elimination with symbolic ADDs Compile out an AC Asymptotic complexity is no worse than variable elimination Overhead of ADDs is pushed into offline phase Generated AC can be much smaller Online inference can be much faster

68
NetworkMax ClustVarsCardTotal Parms%Det%Distinct alarm bm diabetes hailfinder mildew mm munin munin munin munin pathfinder pigs students tcc4f water Networks with Local Structure

69
TabularADD NetworkTime (ms) Improvement alarm barley30714, bm-5-34, diabetes94933, hailfinder link1,6882, mm , mildew7292, munin11551, munin22043, munin33505, munin44064, pathfinder515, pigs st tcc4f water761, Tabular vs ADD: Standard VE

70
Time (s)AC size NetworkAceADD-VEImprov.Tabular-VEADD-VEImprov. alarm ,5343, barley8, ,467,77724,653, bm ,591,75014, diabetes1, ,728,95717,219,0422 hailfinder ,75525, link ,262,77789,097, mildew3, ,094,5923,352, mm ,635,566108, munin11, ,260,407,123 31,409, munin ,295,4265,662, munin ,987,0883,503, munin ,028,5326,869, pathfinder ,58844, pigs ,925,3882,558, st ,374,93422, tcc4f ,40822, water ,996,054170, Tabular vs ADD: VE Compilations

71
NetworkJointreeADD-VEImprov. alarm barley65,22635, bm-5-389, diabetes29,31620, hailfinder link223,542175, mildew10,0774, mm , munin1669,91537, munin217,8577, munin313,3514, munin442,7548, pathfinder1, pigs3,0202, st-3-217, tcc4f water16, ADD-VE vs Jointree: Online Inference Time (ms) Computing all marginals, for 16 pieces of random evidence Work on structured representations of factors is now much more relevant and practical.

72
Compiling by Reduction to Logic Algebraic: MLFs / ACs Logical: CNF / d-DNNF Factoring MLF into AC can be reduced to factoring CNF into d-DNNF CNF to d-DNNF compilers are very powerful (natural for exploiting determinism and CSI)

73
Compiler: d-DNNF CNF Multi-Linear Function Arithmetic Circuit Encode Decode Reduction to Logic

74
a c + a b c + c Multi-linear function:Propositional theory: c ^ (a b) Encode c b 1 a 1 Arithmetic Circuit Decode c b b a a Smooth d-DNNF Compile MLFs ACs CNFs d-DNNF

75
or and A A or and B C or and D E or BD and, Deterministic, Decomposable NNF

76
or and A A or and B C or and D E or BD and, Deterministic, Decomposable NNF Deterministic: Disjuncts are logically disjoint

77
or and A A or and B C or and D E or BD and Deterministic, Decomposable NNF B C B D E D Decomposable: Conjuncts share no variables Compiling CNFs into d-DNNFs AAAI-02, ECAI-04 Compiler at

78
A B C A D E Recursive Conditioning for Compilation or B C D E B C A and A B C D E B C D E

79
and 0 A B 1 DB C D E And-Or Graphs

80
Propositional Encoding of Multi-Linear Functions Propositional theory: Δ = c ^ (a b) Encodes: F = a c + a b c + c abcEncodes TTT abc TTFab TFT ac TFFa FTTbc FTFb FFT c FFF1

81
A B F = λ a λ b θ a θ b|a + λ a λ ~b θ a θ ~b|a + λ ~a λ b θ ~a θ b|~a + λ ~a λ ~b θ ~a θ ~b|~a λ a λ ~a ¬ λ a ¬ λ ~a λ b λ ~b ¬ λ b ¬ λ ~b λ a θ a λ ~a θ ~a λ a ^ λ b θ b|a λ a ^ λ ~ b θ ~ b|a λ ~ a ^ λ b θ b|~a λ ~ a ^ λ ~ b θ ~ b|~a Encoding Model as CNF

82
Why Logic? Encoding local structure is easy: Determinism encoded by adding clauses: CSI encoded by collapsing variables: A natural environment to exploit local structure: DD-backtracking, clause learning, … Non-structural decomposition Non-structural (formula) caching

83
ABC S 0.95 c abc A Pr(S|A,B,C) BC a a a a a a a b b b b b b b c c c c c c Tabular CPT -Functional constraints -Context-specific independence s|abe Local Structure

84
0.95 c abc A Pr(S|A,B,E) BC a a a a a a a b b b b b b b c c c c c c Tabular CPT λ ~a λ b λ c λ s θ s|~abc ¬ λ ~a ¬ λ b ¬ λ c ¬ λ s Determinism

85
0.95 c abc A Pr(S|A,B,C) BC a a a a a a a b b b b b b b c c c c c c Tabular CPT λ a λ b λ s θ s|ab λ a λ b λ c λ s θ s|abc λ a λ b λ ~c λ s θ s|ab~c Context-Specific Independence

86
X Y Belief network x x x y x|y …. CNF Smooth d-DNNF x y x|y x x y x|y Arithmetic Circuit The Ace System:

87
Time (s)AC size NetworkAceADD-VEImprov.Tabular-VEADD-VEImprov. alarm ,5343, barley8, ,467,77724,653, bm ,591,75014, diabetes1, ,728,95717,219,0422 hailfinder ,75525, link ,262,77789,097, mildew3, ,094,5923,352, mm ,635,566108, munin11, ,260,407,123 31,409, munin ,295,4265,662, munin ,987,0883,503, munin ,028,5326,869, pathfinder ,58844, pigs ,925,3882,558, st ,374,93422, tcc4f ,40822, water ,996,054170, ADD-VE vs Logic (Ace): Compile Times

88
NetworkNodesParametersMax Cluster mastermind_04_08_ mastermind_06_08_ mastermind_10_08_ mastermind_03_08_ mastermind_04_08_ mastermind_03_08_ students_03_ students_03_ students_04_ students_05_ students_06_ blockmap_05_ blockmap_10_ blockmap_15_ blockmap_20_ blockmap_22_ ADD-VE vs Logic (Ace)

89
Network Offline Time (min) AC NodesAC Edges Online Inference Time (s) mastermind_04_08_03171,666541, mastermind_06_08_031258,2281,523, mastermind_10_08_0331,293,3234,315, mastermind_03_08_042186,3514,859, mastermind_04_08_045932,35519,457, mastermind_03_08_05101,359,39155,417, students_03_0217,92737, students_03_12124,219113, students_04_163181,166815, students_05_2071,319,8345,236, students_06_24339,922,23336,450, blockmap_05_0312,83320, blockmap_10_03217,749974, blockmap_15_03647,4757,643, blockmap_20_ ,60240,172, blockmap_22_ ,13676,649, ADD-VE vs Logic (Ace)

90
Effect of Local Structure Local Structure Encoded PathfinderWaterMunin4 None981,17813,777,166116,136,985 Det + CSI42,810 (4%) 134,140 (1%) 5,762,690 (5%) Det130,380 (13%) 138,501 (1%) 9,997,267 (9%) CSI200,787 (20%) 11,111,104 (81%) 17,612,036 (15%)

91
Compilation vs Direct Inference Grid problems here…

92
Compilation vs Direct Inference Grid size Treewidth w DetCachet (sec) Ace offline (sec) Ace online (sec) Offline/ Online 16x162550% x223675% x346090% Average over 10 random instances for each grid Ace available at

93
Applications Relational Models Diagnosis Genetic Linkage Analysis

94
burglary(v)=0.005; alarm(v)=(burglary(v):0.95,0.01); calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0); alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)} burglary(v)=0.005; alarm(v)=(burglary(v):0.95,0.01); calls(v,w)= (neighbor(v,w): (prankster(v)): (alarm(w):0.9,0.05), (alarm(w):0.9,0)),0); alarmed(v)= n-or{calls(w,v)|w:neighbor(w,v)} Primula/Ace: Upcoming Release

95

96

97

98

99
Friends and Smokers ( Richardson & Domingos, 2004) M individuals Relations such as smokes(p), cancer(p), friend(p 1,p 2 ) Logical constraints such as: if one of p's friends smokes, then p smokes. Sample Query: probability that given person has cancer

100
Friends & Smokers MNetwork params Treewidth w CNF Vars Cnf Clauses AC Edges Online Time (sec) Offline Time (sec) , , ,714361,9956,9161, ,760705,56519,5253, , ,93442,1337, , ,91277,65613, , ,309129,01022, , ,935199,11134, , ,600290,87549, , ,114407,21869, , ,614451,96577,

101
Students (Pasula & Russell, 2001) P professors S students Various relataios, such as famous(p), well- funded(p), success(s), advises(p,s) Sample Query: probability a professor is well-funded given success of advised students

102
Students Students- Profs Network params Treewidth w CNF Vars Cnf Clauses AC Edges Online Time (sec) Offline Time (min) ,566723,09911,099445, , ,85921,115815, , ,62420,2792,531, , ,73438,8895,236, , ,20933,35316,936, , ,69364,32536,450,

103
Diagnosis QMR-like: Effect of Encoding Evidence 600 diseases (D) and 4100 features (F) Feature F j is a noisy-or of parent diseases D i (11 parents chosen randomly) Sample Query: probability of disease given partial evidence on features. D1D1 D2D2 D3D3 DmDm … F1F1 F2F2 FnFn …

104
Treewidth: CNF variables: 94,900 CNF clauses: 188,600 No. True Features AC Edges Online Time (sec) Offline Time (sec) 048, , , , , , , , , ,141, ,691, ,352, , Diagnosis QMR-like: Effect of Encoding Evidence

105
Ordering genes on a chromosome and determining distance between them Useful for predicting and detecting diseases Associating functionality of genes with their location on the chromosome Gene 1 Gene 2 Gene 3 Genetic Linkage Analysis

106
Pedigrees + Phenotypes + Genotypes

107

108
Arithmetic Circuit Gene 1 Gene 2

109
State of the Art Linkage PedigreeOffline (sec) AC EdgesOnline (sec) Superlink 1.4 (sec) EE ,070, , EE ,855, , EE ,997, EE ,986, EE ,632,

110
Model Compilation: Factoring MLFs into ACs Classical algorithms factor MLFs into ACs: Jointree embeds AC Variable elimination constructs AC bottom up Recursive conditioning constructs AC top down Factoring MLFs into ACs can be reduced to logical reasoning Exploiting local structure to build smaller ACs: Compiling models with very high treewidth is common place Boundary between exact and approximate inference is much changed Public systems now available!

111

112

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google