Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.

Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012

Fish population PbPb PrPr

We want to use our observation which is our catch (1 red and 3 blue fish ) to estimate the concentration of red (Pr) and blue (Pb ) fish in the pond. In other words: we search for a Pr and Pb that gives the highest (Maximum) probability (likelihood) of catching exactly 1 red and 3 blue fish. To accomplish that we need a function/model which we can use to determine the probability of catching exactly 1 red and 3 blue fish when the concentrations of blue and red fish in the pond are Pb and Pr, respectively.

Catch sequenceProbability P r (P b ) 3 4 P r (P b ) 3 Catch sequence & likelihood function Likelihood function

Fish population Question: For which parameter combination do I have the highest probability of catching exactly 1 red and 3 blue fish. – Make a perl function (1) that returns the likelihood of fishing 3 blue and 1 red fish given a Pr and Pb. – Calculate the likelihood for Pr [0.01, 0.99] and Pb (0.01..0.99). Rember that Pr + Pb = 1. Hint: do the actual loop from 1 to 99 and only divide by 100 when you cal l the likelihood function – Find the combination of Pr and Pb values that give highest likelihood of fishing 1 red and 3 blue fish. – If have you time left make a plot of Pr against the likelihood in R. We can determine the population structure by searching for the parameters Pr and Pb that maximize the likelihood of fishing 3 blue and 1 red fish. (According to our model).

Fish Population When Pr = 0.25 and Pb = 0.75 (1-Pr), you have the highest Probability of catching 1 red and 3 blue fish. (Maximum likelihood estimation)

What has this fishing experiment to do with transcriptome sequencing. On the first instance quite a lot ! – Lets simply replace pond with fish by an tube containing a mixture of mRNA molecules. – Next, we can roughly replace “fishing” by “sequencing”. – Finally we replace the “catch” with our “mapped reads”. Unknown Transcript population Mapped reads (Catch) Sequencing

Main question What are the relative abundances of the transcripts in our sample. We have to estimate these relative abundances using: – the read alignments. – Transcript structure knowledge

Is there any difference between sequencing and fishing Usually with fishing you catch whole fish With current technology, you can only get high throughput for short reads. – These reads represent transcript fragments not only derived different genes but also different AS isoforms. – Hence, it is like performing the fishing in a pond with fish fragments instead of entire fish.

What does the likelihood function calculate for mRNA seq experiments The likelihood of observing this specific set of reads (R) given a specific distribution of transcript abundances in the sample. The function is: the product of the probabilities of observing each of the individual reads (r j ) (Given a particular transcript abundance distribution).

Probability of observing a specific read (I) Components 1.The fact (K jt ) that a read j is compatible with the transcript t or not. K jt is either 1 or 0. 2.The probability of selecting a read that originates from a specific transcript: a)Corresponds to the fraction of reads in the read pool that originate from the transcript. The product of the relative abundance- (a t ) and the effective length of the transcript (l t ) Transcript read ltlt The probability of observing a specific read is the sum of the probabilities that the Read has originated from the individual transcripts in the sample

Probability of observing a specific read (II) 2. The probability that a read originates exactly from a certain position on transcript t. Without positional bias this probability is uniform along the sequence thus: 1 / l t

Probability of observing a specific read (III) Does read j map to transcript t Probability of selecting the Read from the read pool Probability of originating From a specific position on the transcript

Why the summation The probability of observing a read that maps to multiple transcripts is obtained by summing the probabilities of the read mapping to each of the individual transcripts. AS isoforms

Probability of observing the entire set of reads (R) Product of the probabilities for observing each individual read.

Log likelihood Large numbers of consecutive multiplications of probabilities (0..1) are a computational difficulty. A good solution is to replace the multiplication of the probabilities by a summation of the logarithm of the probabilities.

Assignment T1T1 T2T2 T3T3 AS Isoforms Single isoform gene a1a1 a2a2 a3a3 a 1 + a 2 + a 3 = 1 Relative abundances Important rule:

Assignment You will get a file that represents a compatibility matrix. – Each line has 3 values such as 1 1 0 (indicating that the read maps to transcript 0 and 1 but not 2. A part of the code is already filled in. Build the missing functions for the program and estimate the abundances of the 3 transcripts.

Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.

Similar presentations

Presentation on theme: "Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.

Similar presentations

Presentation on theme: "Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012."— Presentation transcript:

Similar presentations

About project

Feedback