Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison.

Similar presentations


Presentation on theme: "Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison."— Presentation transcript:

1 Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison of large sequences (up to 250 000 000) 5 Efficient data search structures and algorithms 6 Proteins...

2 2. Comparison of short sequences (<10.000 bps) Summary (more or less) 2.1 Dot matrix 2.2 Pairwise alignment. 2.3 Hash algorithms. 2.4 Multiple alignment.

3 2. Dot matrix Given two sequences, how we can analyse their degree of identity? By searching those parts that match: S1S1 S2S2 x y 1/0 1 if both characters coincide

4 2. Dot matrix Given two sequences, how we can analyse their degree of identity? By searching those parts that match: S1S1 S2S2 x y S1S1 S2S2 x..x.. y..... 1/0 1 if both characters coincide ?

5 2.1 Dot matrix What is the cost of the algorithm? When are the matchings relevant? accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t L=window length m(i,j)=1 iff S1(i..i+L)=S2(j..j+L): exact matching m(i,j)=1 iff k over L coincide: approximate matching. m(i,j)=k iff k over L coincide: approximate matching

6 2.1. Dot matrix: algorithm cost accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t long(S1)*long(S2)* L in other words O(n 2 L) can long(S1)*long(S2) be possible? can we also say that O(n 2 ) is independent of L?

7 2.1. Dot matrix: signals A: transposons C: Random B: S1=S2 When are signals statistically significant?

8 2.1. Dot matrix: statistical significance: We need to define a random model against which to compare the signals: we define RV: X number of characters that coincide, then Prob(X=k)=comb(L,k) p k (1-p) L-k Given x..x.. y..... S1S1 S2S2 L=window length What is its expected value?


Download ppt "Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison."

Similar presentations


Ads by Google