Download presentation

Presentation is loading. Please wait.

Published byFrancine Briggs Modified over 2 years ago

1
Lecture 6 CS5661 Pairwise Sequence Analysis-V Relatedness –“Not just important, but everything” Modeling Alignment Scores –Coin Tosses –Unit Distributions –Extreme Value Distribution –Lambda and K revealed –Loose Ends

2
Lecture 6 CS5662 Modeling Expectation Reduced model: Coin tosses –Given: N coin tosses Probability of heads p –Problem: What is the average number of longest run of heads? –Solution: Experimental: Perform several repetitions and count Theoretical: E(Run max ) = log 1/p N –For example, for fair coin and 64 tosses, E(Run max ) = 6

3
Lecture 6 CS5663 Random alignment as Coin tosses Head = Match Assume –Score = Run of matches –Maximum score = Longest run of matches Therefore –Same model of expectation –For example: For DNA sequences of length N, E(matchlength max ) = Expected longest run of matches = log 1/p N

4
Lecture 6 CS5664 Local alignment as Coin tosses Assume –Score in local alignment = Run of matches –Maximum score = Longest run of matches Therefore –Similar model of expectation –For DNA sequences of length n & m E(Matchlength max ) ~ log 1/p (nm)(Why not just n or m?) ~ log 1/p (K ’ nm) Var(Matchlength max ) = C (i.e., Independent of sample space)

5
Lecture 6 CS5665 Refining Model S = AS matrix based scoring between unrelated sequences E(S) ~ log 1/p (K’nm) ~ [ln(Knm)]/ (where = log e 1/p) Holy Grail: Need P(S > x), probability of a score between unrelated sequences exceeding x

6
Lecture 6 CS5666 Poisson distribution estimate of P(S > x) Consider Coin Toss Example Given [x >> E(Run max )] Define Success = (Run max x) Define P n = Probability of n successes Define y = E[Success],i.e., Average no. of successes Then, probability of n successes follows Poisson dist. P n = (e- y y n )/n! Probability of 0 successes (No score exceeding x) is given by P 0 = e- y. Then, probability of at least one score exceeding x, P(S > x) = i 0 P i = (1 - P 0 ) = 1 - e- y For Poisson distribution, y = Kmne - x. Therefore, P(S > x) = 1 – exp (-Kmne - x )

7
Lecture 6 CS5667 Unit Distributions Normalize Gaussian and EVD –Area under curve = 1 –Curve maximum at 0 Then –For Gaussian Mean = 0; SD = 1 P(S > x) = 1 – exp (-e -x ) –For EVD Mean = 0.577 (Euler cons); Variance = 2 /6 = 1.645 P(S > x) = 1 – exp (-e - (x-u) ) –Z-score representation in terms of SDs P (Z > z) = 1 – exp(-e- 1.2825z – 0.5772 )

8
Lecture 6 CS5668 Lambda and K = Scale factor for scoring system –Effectively converts AS matrix values to actual natural log likelihoods K = Scale factor that reduces search space to compensate for non- independence of local alignments Esimated by fitting to Poisson approximation or equation for E(S)

9
Lecture 6 CS5669 Treasure Trove of Probabilities Probability distribution of scores between unrelated sequences P(S unrel ) Probability distribution of number of scores from P(S unrel ) exceeding some cut-off, mean represents number of scores exceeding cut-off observed on average Probability of observing score x occurring between unrelated sequences P(S x)

10
Lecture 6 CS56610 Loose Ends What about gap parameters? –Short answer: No formal theory –Long answer: Found empirically Choice of parameters can be used to convert local alignment algorithm into a global alignment What about gapped alignment? –Not formally proven, but simulations show statistical behavior similar to ungapped alignment Effective sequence length n’ = n – E(matchLength max )

Similar presentations

OK

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on types of angles formed when two parallel lines are intersected by a transversal Ppt on operating system free download Ppt on road accidents in india Ppt on idea cellular ltd Ppt on social media marketing Ppt on railway track laying Ppt on series and parallel circuits lessons Ppt on aerobics instructor Ppt on single phase and three phase dual converter y Ppt on advances in automobile engineering download