.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Learning: Parameter Estimation
Maximum Likelihood Estimation Navneet Goyal BITS, Pilani.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Maximum Likelihood-Maximum Entropy Duality : Session 1 Pushpak Bhattacharyya Scribed by Aditya Joshi Presented in NLP-AI talk on 14 th January, 2014.
Parameter Estimation using likelihood functions Tutorial #1
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. Algorithms in Computational Biology – אלגוריתמים בביולוגיה חישובית – (
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Algorithms in Computational Biology – אלגוריתמים בביולוגיה חישובית – (
. Learning Bayesian networks Slides by Nir Friedman.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Bayesian learning finalized (with high probability)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Phylogenetic Estimation using Maximum Likelihood By: Jimin Zhu Xin Gong Xin Gong Sravanti polsani Sravanti polsani Rama sharma Rama sharma Shlomit Klopman.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Class 3: Estimating Scoring Rules for Sequence Alignment.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Thanks to Nir Friedman, HU
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
Crash Course on Machine Learning
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
CSE 446: Point Estimation Winter 2012 Dan Weld Slides adapted from Carlos Guestrin (& Luke Zettlemoyer)
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
Maximum Likelihood Estimation
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning 5. Parametric Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
Review of statistical modeling and probability theory Alan Moses ML4bio.
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Conditional Expectation
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Markov Chains Tutorial #5
Maximum Likelihood Estimation
Oliver Schulte Machine Learning 726
Tutorial #3 by Ma’ayan Fishelson
CS498-EA Reasoning in AI Lecture #20
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CONTEXT DEPENDENT CLASSIFICATION
Markov Chains Tutorial #5
Parametric Methods Berlin Chen, 2005 References:
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Learning Bayesian networks
Presentation transcript:

.

. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from Nir Friedman’s HU course, available at Changes made by Dan Geiger, Ydo Wexler, and finally by Benny Chor.

3 The Setting u We have a probabilistic model, M, of some phenomena. We know exactly the structure of M, but not the values of its probabilistic parameters, .  Each “execution” of M produces an observation, x[i], according to the (unknown) distribution induced by M.  Goal: After observing x[1],…, x[n], estimate the model parameters, , that generated the observed data.

4 Maximum Likelihood Estimation (MLE)  The likelihood of the observed data, given the model parameters , as the conditional probability that the model, M, with parameters , produces x[1],…, x[n]. L(  )=Pr( x[1],…, x[n] | , M), u In MLE we seek the model parameters, , that maximize the likelihood.

5 Maximum Likelihood Estimation (MLE) u In MLE we seek the model parameters, , that maximize the likelihood. u The MLE principle is applicable in a wide variety of applications, from speech recognition, through natural language processing, to computational biology. u We will start with the simplest example: Estimating the bias of a coin. Then apply MLE to inferring phylogenetic trees. u (will later talk about MAP - Bayesian inference).

6 Example: Binomial Experiment u When tossed, it can land in one of two positions: Head (H) or Tail (T) HeadTail  We denote by  the (unknown) probability P(H). Estimation task:  Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)=  and P(T) = 1 - 

7 Statistical Parameter Fitting (restement)  Consider instances x[1], x[2], …, x[M] such that l The set of values that x can take is known l Each is sampled from the same distribution l Each sampled independently of the rest i.i.d. Samples (why??) u The task is to find a vector of parameters  that have generated the given data. This vector parameter  can be used to predict future data.

8 The Likelihood Function u How good is a particular  ? It depends on how likely it is to generate the observed data u The likelihood for the sequence H,T, T, H, H is  L()L()

9 Sufficient Statistics  To compute the likelihood in the thumbtack example we only require N H and N T (the number of heads and the number of tails)  N H and N T are sufficient statistics for the binomial distribution

10 Sufficient Statistics u A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood Datasets Statistics  Formally, s(D) is a sufficient statistics if for any two datasets D and D’ s(D) = s(D’ )  L D (  ) = L D’ (  )

11 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function u This is one of the most commonly used estimators in statistics u Intuitively appealing  One usually maximizes the log-likelihood function, defined as l D (  ) = ln L D (  )

12 Example: MLE in Binomial Data Taking derivative and equating it to 0, we get L()L() Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6 (which coincides with what one would expect)

13 From Binomial to Multinomial  Now suppose X can have the values 1,2,…,K (For example a die has K=6 sides)  We want to learn the parameters  1,  2. …,  K Sufficient statistics:  N 1, N 2, …, N K - the number of times each outcome is observed Likelihood function: MLE: assignment 3)

14 Example: Multinomial u Let be a protein sequence  We want to learn the parameters q 1, q 2,…, q 20 corresponding to the frequencies of the 20 amino acids  N 1, N 2, …, N 20 - the number of times each amino acid is observed in the sequence Likelihood function: MLE:

15 Inferring Phylogenetic Trees u Let be n sequence (DNA or AA). Assume for simplicity they are all same length, l. u We want to learn the parameters of a phylogenetic tree that maximizes the likelihood. u But wait: Should first specify a model.

16 A Probabilistic Model u Our models will consist of a “regular” tree, where in addition, edges are assigned substituion probabilities. u For simplicity, assume our “DNA” has only two states, say X and Y. u If edge e is assigned probability p e, this means that the probability of substitution (X Y) across e is p e.

17 A Probabilistic Model (2) u Our models will consist of a “regular” tree, where in addition, edges are assigned substituion probabilities. u For simplicity, assume our “DNA” has only two states, say X and Y. u If edge e is assigned probability p e, this means that the probability of substitution (X Y) across e is p e.

18 A Probabilistic Model (3) u If edge e is assigned probability p e, this means that the probability of more involved patterns of substitution across e (e.g. XXYXY YXYXX) is determined, and easily computed: p e 2 (1- p e ) 3 for this pattern. u Q.: What if pattern on both sides is known, but p e is not known? u A.: Makes sense to seek p e that maximizes probability of observation. u So far, this is identical to coin toss example.

19 A Probabilistic Model (4) Now we don’t know the states at internal node(s), nor the edge parameters pe1, pe2, pe3 XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 But a single edge is a fairly boring tree… ?????

20 Two Ways to Go 1. Maximize over states of internal node(s) 2. Average over states of internal node(s) In both cases, we maximize over edge parameters XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

21 Two Ways to Go In the first version (average, or sum over states of internal nodes) we are looking for the “most likely” setting of tree edges. This is called maximum likelihood (ML) inference of phylogenetic trees. ML is probably the inference method most widely (wildly ) used. XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

22 Two Ways to Go In the second version (maximize over states of internal nodes) we are looking for the “most likely” ancestral states. This is called ancestral maximum likelihood (AML). In some sense AML is “between” MP (having ancestral states) and ML (because the goal is still to maximize likelihood). XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

. or a break bust