Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Continuous Probability Distributions and Bayesian Networks with Continuous Variables.

Similar presentations


Presentation on theme: "CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Continuous Probability Distributions and Bayesian Networks with Continuous Variables."— Presentation transcript:

1 CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Continuous Probability Distributions and Bayesian Networks with Continuous Variables

2 A GENDA Continuous probability distributions Common families: The Gaussian distribution Linear Gaussian Bayesian networks

3 C ONTINUOUS PROBABILITY DISTRIBUTIONS Let X be a random variable in R, P(X) be a probability distribution over X P(x)  0 for all x, “sums to 1” Challenge: (most of the time) P(X=x) = 0 for any x

4 CDF AND PDF Probability density function (pdf) f(x) Nonnegative,  f(x) dx = 1 Cumulative distribution function (cdf) g(x) g(x) = P(X  x) g(-  ) = 0, g(  ) = 1, g(x) =  (- ,x] f(y) dy, monotonic f(x) = g’(x) pdf f(x) cdf g(x) 1 Both cdfs and pdfs are complete representations of the probability space over X, but usually pdfs are more intuitive to work with.

5 C AVEATS pdfs may exceed 1 Deterministic values, or ones taking on a few discrete values, can be represented in terms of the Dirac delta function  a (x) pdf (an improper function)  a (x) = 0 if x  a  a (x) =  if x = a   a (x) dx = 1

6 C OMMON D ISTRIBUTIONS U(a 1,b 1 ) U(a 2,b 2 )

7 M ULTIVARIATE C ONTINUOUS D ISTRIBUTIONS Consider c.d.f. g(x,y) = P(X  x,Y  y) g(- ,y) = 0, g(x,-  ) = 0 g( ,  ) = 1 g(x,  ) = P(X  x), g( ,x) = P(Y  y) g monotonic Its joint density is given by the p.d.f. f(x,y) iff g(p,q) =  (- ,p]  (- ,q] f(x,y) dy dx i.e. P(a x  X  b x,a y  Y  b y ) =  [ax,bx]  [ay,by] f(x,y) dy dx

8 M ARGINALIZATION WORKS OVER PDF S Marginalizing f(x,y) over y: If h(x) =  (- ,  ) f(x,y) dy, then h(x) is a p.d.f. for P(X  x) Proof: P(X  a) = P(X  a,Y  ) = g(a,  ) =  (- ,a]  (- ,  ) f(x,y) dy dx h(a) = d/da P(X  a) = d/da  (- ,a]  (- ,  ) f(x,y) dy dx ( definition ) =  (- ,  ) f(a,y) dy ( fundamental theorem of calculus ) So, the joint density contains all information needed to reconstruct the density of each individual variable

9 C ONDITIONAL DENSITIES

10 T RANSFORMATIONS OF CONTINUOUS RANDOM VARIABLES

11 N OTES : In general, continuous multivariate distributions are hard to handle exactly But, there are specific classes that lead to efficient exact inference techniques In particular, Gaussians Other distributions usually require resorting to Monte Carlo approaches

12 M ULTIVARIATE G AUSSIANS X ~ N( ,  )

13 I NDEPENDENCE IN G AUSSIANS

14 L INEAR T RANSFORMATIONS

15 M ARGINALIZATION AND C ONDITIONING If (X,Y) ~ N ([  X  Y ],[  XX,  XY ;  YX,  YY ]), then: Marginalization Summing out Y gives X ~ N(  X,  XX ) Conditioning: On observing Y=y, we have X ~ N (  X -  XY  YY -1 (y-  Y ),  XX -  XY  YY -1  YX )

16 L INEAR G AUSSIAN M ODELS A conditional linear Gaussian model has : P(Y|X=x) = N(  0 +Ax,  0 ) With parameters  0, A, and  0

17 L INEAR G AUSSIAN M ODELS (Recall the linear transformation rule) If X~ N ( ,  ) and y=Ax+b, then Y ~ N (A  +b, A  A T )

18 CLG B AYESIAN N ETWORKS If all variables in a Bayesian network have Gaussian or CLG CPTS, inference can be done efficiently! X1 Y X2X2 P(Y|x 1,x 2 ) = N(ax 1 +bx 2,  y ) P(X 2 ) = N(  2,  2 ) P(X 1 ) = N(  1,  1 ) P(Z|x 1,y) = N(c+dx 1 +ey,  z ) Z

19 C ANONICAL R EPRESENTATION All factors in a CLG Bayes net can be represented as C( x ;K,h,g) with C( x ;K,h,g) = exp(-1/2 x T K x + h T x + g) Ex: if P(Y|x) = N(  0 +Ax,  0 ) then P(y|x) = 1/Z exp(-1/2 (y-Ax-  0 ) T  0 -1 (y-Ax-  0 )) =1/Z exp(-1/2 (y,x) T [I –A] T  0 -1 [I –A](y,x) +  0 T  0 -1 [I –A](y,x) – ½  0 T  0 -1  0 ) Is of form C((y,x);K,h,g) with K = [I –A] T  0 -1 [I –A] h = [I –A] T  0 -1  0 g = log(1/Z) exp (–½  0 T  0 -1  0 )

20 P RODUCT O PERATIONS C( x ;K 1,h 1,g 1 )C( x ;K 2,h 2,g 2 ) = C( x ;K,h,g) with K=K 1 +K 2 h=h 1 +h 2 g = g 1 +g 2 If the scopes of the two factors are not equivalent, just extend the K’s with 0 rows and columns, and h’s with 0 rows so that each row/column matches

21 S UM O PERATION  C(( x,y );K,h,g)d y = C( x ;K’,h’,g’)with K’=K XX -K XY K YY -1 K YX h’=h X- K XY K YY -1 h Y g’ = g+1/2 (log|2  K YY -1 |+h Y T K YY -1 h Y ) Using these two operations we can implement inference algorithms developed for discrete Bayes nets: Top-down inference, variable elimination (exact) Belief propagation (approximate)

22 M ONTE C ARLO WITH G AUSSIANS Assume sample X ~ N(0,1) is given as a primitive RandN () To sample X ~ N( ,  2 ), simply  RandN () How to generate a random multivariate Gaussian variable N(  ?

23 M ONTE C ARLO WITH G AUSSIANS Assume sample X ~ N(0,1) is given as a primitive RandN () To sample X ~ N( ,  2 ), simply set x  RandN () How to generate a random multivariate Gaussian variable N(  ? Take Cholesky decomposition:   =LL , L invertible if  is positive definite Let y = L T (x-  ) P(y)  exp(-1/2 (y 1 2 + … + y N 2 )) is isotropic, and each y i is independent Sample each component of y at random Set x  L -T y+ 

24 M ONTE C ARLO W ITH L IKELIHOOD W EIGHTING Monte Carlo with rejection has probability 0 of finding a continuous value given as evidence, so likelihood weighting must be used X Y=y P(Y|x)=N(Ax+  Y,  Y ) P(X)=N(  X,  X ) Step 1: Sample x ~ N(  X,  X ) Step 2: weight by P(y|x)

25 H YBRID N ETWORKS Hybrid networks combine both discrete and continuous variables Exact inference techniques are hard to apply Result in Gaussian mixtures NP hard even in polytree networks Monte Carlo techniques apply in straightforward way Belief approximation can be applied (e.g., collapsing Gaussian mixtures to single Gaussians)

26 I SSUES Non-gaussian distributions Nonlinear dependencies More in future lectures on particle filtering


Download ppt "CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Continuous Probability Distributions and Bayesian Networks with Continuous Variables."

Similar presentations


Ads by Google