Bayesian Framework Finding the best model Minimizing model complexity

Bayesian Framework Finding the best model Minimizing model complexity
Maximum likelihood Maximum a posteriori Posterior mean estimator Minimizing model complexity Ockham’s razor Minimum Description Length Parametrizing models Lecture 5, CS567

Anatomy of a model Model = Parameter scheme + values for parameters
Model for DNA sequence 4 parameters, one for each character Model M(w1) P(A) = P(T) = P(G) = P(C) = 0.25 Model M(w2) P(A) = P(G) = 0.3; P(T) = P(C) = 0.2 Lecture 5, CS567

Maximum Likelihood Likelihood: Given a particular model, how likely is it that this data would have been observed? L(M(wi)) = P(D|M(wi)) Maximum likelihood: Given a number of models, which one has the highest likelihood? Maximum value of L(M) Wmax = maxarg M(w) P(D|M(wi)) Lecture 5, CS567

Maximum Likelihood Example:
Data: HHHTTT (sequence, i.e., as permutation) Model: Binomial with parameter p(H) Parameter set 1: p(H) = 0.5; p(T) = 1-p(H); Parameter set 2: p(H) = 0.25; p(T) = 1-p(H); Likelihoods: P(D|M(w1)) = (0.5)3(0.5)3 = P(D|M(w2)) = (0.25)3(0.75)3 = Maximum likelihood estimate = M(w1) In fact, L(M(w1)) > L(M(wi|i  1) Lecture 5, CS567

Maximum Likelihood Example:
Data: HTTT (sequence, i.e., as permutation) Model: Binomial with parameter p(H) Parameter set 1: p(H) = 0.5; p(T) = 1-p(H); Parameter set 2: p(H) = 0.25; p(T) = 1-p(H); Likelihoods: P(D|M(w1)) = (0.5)(0.5)3 = P(D|M(w2)) = (0.25)(0.75)3 = Maximum likelihood estimate = M(w2)! In fact, L(M(w2)) > L(M(wi|i  2) So, is something wrong with this coin? Lecture 5, CS567

Maximum Likelihood Maximum is unreliable when data set size is small
Prior important in dealing with such errors As data sample gets to be larger (more representative) Maximum likelihood estimate of parameters tends to the ‘true’ value Lecture 5, CS567

Posterior Mean Estimator
Instead of using maximum value, use Expectation of model parameters Wpme = (wi)P (wi|n)dW where n = number of parameter combinations Makes sense when no clearly optimal choice (no sharp peak in parametric space) Lecture 5, CS567

Dealing with Model Complexity
Ockham’s razor: “Car is stopping at cross-walk to allow me to cross, not to shoot a bullet at me” Go for the simplest explanation that matches the facts (probabilistically, of course) Introduce priors than penalize complex models Simpler models assign higher likelihoods Minimum Description Length (kind of similar): Economical specification of model Lecture 5, CS567

Graphical Models Real world = Massive network of dependencies
Model = Sparsely connected network (Reduction of dimensionality) Graph representation Edge = dependency; No edge = Independence Directed/Undirected/Mixed (Chain independence) Goal: Factor graph into clusters of local probabilities Lecture 5, CS567

Graphical Models Undirected graphs Directed graphs
Markov networks/random fields, Boltzmann machines Symmetric Statistical mechanics, Image processing Directed graphs Bayesian/Belief/Causal/Influence networks Temporal Causality Expert systems Neural networks, Hidden Markov models Lecture 5, CS567

Graphical Models Neighborhood
For a single variable For a set of inter-dependent variables (Boundary) Hidden variables (Use Expectation Maximization algorithm) Hierarchy Different time scales/length scales Hyperparameters () P(w) =  P(w|) P() d Prior = P() Computationally easier Mixture/Hybrid modeling P= n i Pi Lecture 5, CS567

Bayesian Framework Finding the best model Minimizing model complexity

Similar presentations

Presentation on theme: "Bayesian Framework Finding the best model Minimizing model complexity"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian Framework Finding the best model Minimizing model complexity

Similar presentations

Presentation on theme: "Bayesian Framework Finding the best model Minimizing model complexity"— Presentation transcript:

Similar presentations

About project

Feedback