Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Algorithms in Computational Biology – 236522 אלגוריתמים בביולוגיה חישובית – 236522 (http://webcourse.cs.technion.ac.il/236522)http://webcourse.cs.technion.ac.il/236522.

Similar presentations


Presentation on theme: ". Algorithms in Computational Biology – 236522 אלגוריתמים בביולוגיה חישובית – 236522 (http://webcourse.cs.technion.ac.il/236522)http://webcourse.cs.technion.ac.il/236522."— Presentation transcript:

1 . Algorithms in Computational Biology – 236522 אלגוריתמים בביולוגיה חישובית – 236522 (http://webcourse.cs.technion.ac.il/236522)http://webcourse.cs.technion.ac.il/236522 תרגול : יום ג ' 9:30 – 10:20 טאוב 4 אילן גרונאו משרד : טאוב 700 טל :829(4894) דוא " ל : ilangr@cs.technion.ac.il ilangr@cs.technion.ac.il מצגות התרגול / הרצאה : ניתן להוריד יום לפני התרגול / הרצאה מאתר הקורס המצגות הן חומר עזר בלבד !! תרגילי בית : 5 תרגילים הגשה חובה בתא המתרגל (110# בקומה 5) איחורים ובעיות אחרות : להודיע למתרגל זמן מספיק לפני מועד ההגשה

2 . Algorithms in Computational Biology – The full story Best biological explanaiton Biological data Hypotheses space Problems: size of the space complex scoring functions

3 . Parameter Estimation Using Likelihood Functions Tutorial #1 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger

4 . The Problem: Data set Probabilistic Model Find the best explanation for the observed data Helps predict behavior of similar data sets Parameters: Θ = θ 1, θ 2, θ 3, …

5 . An example: Binomial experiments Heads - P(H) Tails - 1-P(H) The unknown parameter: θ=P(H) The data: series of experiment results, e.g. D = H H T H T T T H H … Main assumption: each experiment is independent of others Data set Model Parameters: Θ

6 . Maximum Likelihood Estimation (MLE): The likelihood of a given value for θ : L D (θ) = P(D| θ)  We wish to find a value for θ which maximizes the likelihood Example: The likelihood of ‘HTTHH’ is: L HTTHH (θ) = P(HTTHH | θ)= θ (1-θ) (1- θ) θ θ = θ 3 (1-θ) 2 We only need to know N(H) and N(T) (number of Heads and Tails). These are sufficient statistics : L D (θ) = θ N(H) (1-θ) N(T) An example: Binomial experiments

7 7 Sufficient Statistics A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood. Formally, s(D) is a sufficient statistics if for any two datasets D and D’: s(D) = s(D’ )  L D (  ) = L D’ (  ) Likelihood may be calculated on the statistics.

8 . Maximum Likelihood Estimation (MLE): Reminder: L D (θ) = θ N(H) (1-θ) N(T) We wish to maximize the likelihood (or log-likelihood). l D (θ) = log(L D (θ)) = N(H)·log(θ) + N(T)·log(1-θ) Maximization: l D ’(θ) = 0  00.20.40.60.81  L()L() D = H T T H H An example: Binomial experiments

9 . Still – a series of independent experiments Each experiment has K possible results Examples: - die toss ( K=6 ) - DNA sequence ( K=4 ) - protein sequence ( K=20 ) We want to learn the parameters  1,  2. …,  K Sufficient statistics: N 1, N 2, …, N K - the number of times each outcome is observed Likelihood: MLE: Multinomial experiments

10 . Data set Model Parameters: Θ Likelihood – P(D|Θ) What we would like is to maximize is P(Θ|D). Bayes’ rule: Due to: In our case: The prior probability captures our subjective prior knowledge (prejudice) of the model parameters. Is ML all we need? prior probability likelihood posterior probability

11 . The prior probability captures our subjective prior knowledge (prejudice) of the model parameters. Example: After a set of coin tosses: ‘HTTHH’ what would you bet on for the next toss?  we a priori assume the coin is balanced with high probability. Possible assumptions in molecular biology: some amino acids have the similar frequencies some amino acids have low frequencies … Bayesian inference

12 . Example: dishonest casino A casino uses 2 kind of dice: 99% are fair 1% is loaded: 6 comes up 50% of the times We pick a die at random and roll it 3 times We get 3 consecutive sixes  What is the probability the die is loaded? Bayesian inference

13 . Example: dishonest casino  Pr[ loaded|3 sixes ] = ? Use Bayes’ rule !  Pr[ 3 sixes|loaded ] = (0.5) 3 (likelihood of ‘loaded’)  Pr[ loaded ] = 0.01 (prior)  Pr[ 3 sixes ] = Pr[ 3 sixes|loaded ] * Pr[ loaded ] + Pr[ 3 sixes|unloaded ] * Pr[ unloaded ] = (0.5) 3 0.01 + (0.1666) 3 0.99 = 0.00458333 Unlikely loaded!! Bayesian inference

14 . Biological Example: protein amino-acid composition Extracellular proteins have a slightly different amino acid composition than intracellular proteins. From a large enough protein database (e.g. SwissProt ) we can get the following statistics: - P(int) - the probability that an amino-acid sequence is intracellular - P(ext) - the probability that an amino-acid sequence is extracellular - P(a i |int) - the frequency of amino acid a i for intracellular proteins - P(a i |ext) - the frequency of amino acid a i for extracellular proteins  What is the probability that a given new protein sequence: X=x 1 x 2 … x n is extracellular? Bayesian inference

15 . What is the probability that a given new protein sequence: X=x 1 x 2 ….x n is extracellular? Assuming that every sequence is either extracellular or intracellular (but not both), we have: P(ext) = 1-P(int) P(X) = P(X|ext) ·P(ext) + P(X|int)·P(int) Furthermore: By Bayes’ theorem: Bayesian inference - Biological Example: Proteins (cont.) prior posterior

16 . Summary: Data set Model Parameters: Θ = θ 1, θ 2, θ 3, … The basic paradigm: Estimate model parameters maximizing:  Likelihood - P(D|Θ)  Posterior probability - P(Θ |D) α P(D|Θ) P(Θ) In the absence of a significant prior they are equivalent


Download ppt ". Algorithms in Computational Biology – 236522 אלגוריתמים בביולוגיה חישובית – 236522 (http://webcourse.cs.technion.ac.il/236522)http://webcourse.cs.technion.ac.il/236522."

Similar presentations


Ads by Google