Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Framework Finding the best model Minimizing model complexity

Similar presentations


Presentation on theme: "Bayesian Framework Finding the best model Minimizing model complexity"— Presentation transcript:

1 Bayesian Framework Finding the best model Minimizing model complexity
Maximum likelihood Maximum a posteriori Posterior mean estimator Minimizing model complexity Ockham’s razor Minimum Description Length Parametrizing models Lecture 5, CS567

2 Anatomy of a model Model = Parameter scheme + values for parameters
Model for DNA sequence 4 parameters, one for each character Model M(w1) P(A) = P(T) = P(G) = P(C) = 0.25 Model M(w2) P(A) = P(G) = 0.3; P(T) = P(C) = 0.2 Lecture 5, CS567

3 Maximum Likelihood Likelihood: Given a particular model, how likely is it that this data would have been observed? L(M(wi)) = P(D|M(wi)) Maximum likelihood: Given a number of models, which one has the highest likelihood? Maximum value of L(M) Wmax = maxarg M(w) P(D|M(wi)) Lecture 5, CS567

4 Maximum Likelihood Example:
Data: HHHTTT (sequence, i.e., as permutation) Model: Binomial with parameter p(H) Parameter set 1: p(H) = 0.5; p(T) = 1-p(H); Parameter set 2: p(H) = 0.25; p(T) = 1-p(H); Likelihoods: P(D|M(w1)) = (0.5)3(0.5)3 = P(D|M(w2)) = (0.25)3(0.75)3 = Maximum likelihood estimate = M(w1) In fact, L(M(w1)) > L(M(wi|i  1) Lecture 5, CS567

5 Maximum Likelihood Example:
Data: HTTT (sequence, i.e., as permutation) Model: Binomial with parameter p(H) Parameter set 1: p(H) = 0.5; p(T) = 1-p(H); Parameter set 2: p(H) = 0.25; p(T) = 1-p(H); Likelihoods: P(D|M(w1)) = (0.5)(0.5)3 = P(D|M(w2)) = (0.25)(0.75)3 = Maximum likelihood estimate = M(w2)! In fact, L(M(w2)) > L(M(wi|i  2) So, is something wrong with this coin? Lecture 5, CS567

6 Maximum Likelihood Maximum is unreliable when data set size is small
Prior important in dealing with such errors As data sample gets to be larger (more representative) Maximum likelihood estimate of parameters tends to the ‘true’ value Lecture 5, CS567

7 Maximum a posteriori Need to factor in prior in maximum likelihood estimate Posterior likelihood = (Likelihood) (Prior) = P(D|M(wi)) P(wi|M) Maximum a posteriori WMAP = maxarg M(w) P(D|M(wi)) P(wi|M) From Bayes theorem: P(w|M,D) = [P(D|M(wi)) P(wi|M)] / [P(D|M)] As P(D|M) does not affect the maximum of LHS, numerator is sufficient to find MAP Lecture 5, CS567

8 Posterior Mean Estimator
Instead of using maximum value, use Expectation of model parameters Wpme = (wi)P (wi|n)dW where n = number of parameter combinations Makes sense when no clearly optimal choice (no sharp peak in parametric space) Lecture 5, CS567

9 Dealing with Model Complexity
Ockham’s razor: “Car is stopping at cross-walk to allow me to cross, not to shoot a bullet at me” Go for the simplest explanation that matches the facts (probabilistically, of course) Introduce priors than penalize complex models Simpler models assign higher likelihoods Minimum Description Length (kind of similar): Economical specification of model Lecture 5, CS567

10 Graphical Models Real world = Massive network of dependencies
Model = Sparsely connected network (Reduction of dimensionality) Graph representation Edge = dependency; No edge = Independence Directed/Undirected/Mixed (Chain independence) Goal: Factor graph into clusters of local probabilities Lecture 5, CS567

11 Graphical Models Undirected graphs Directed graphs
Markov networks/random fields, Boltzmann machines Symmetric Statistical mechanics, Image processing Directed graphs Bayesian/Belief/Causal/Influence networks Temporal Causality Expert systems Neural networks, Hidden Markov models Lecture 5, CS567

12 Graphical Models Neighborhood
For a single variable For a set of inter-dependent variables (Boundary) Hidden variables (Use Expectation Maximization algorithm) Hierarchy Different time scales/length scales Hyperparameters () P(w) =  P(w|) P() d Prior = P() Computationally easier Mixture/Hybrid modeling P= n i Pi Lecture 5, CS567


Download ppt "Bayesian Framework Finding the best model Minimizing model complexity"

Similar presentations


Ads by Google