Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment and Dynamic Programming

Similar presentations


Presentation on theme: "Sequence Alignment and Dynamic Programming"— Presentation transcript:

1 Sequence Alignment and Dynamic Programming
6.047/ Computational Biology: Genomes, Networks, Evolution Sequence Alignment and Dynamic Programming Tue Sept 9, 2008

2 Challenges in Computational Biology
4 Genome Assembly 5 Regulatory motif discovery 1 Gene Finding DNA 2 Sequence alignment 6 Comparative Genomics TCATGCTAT TCGTGATAA TGAGGATAT TTATCATAT TTATGATTT 3 Database lookup 7 Evolutionary Theory 8 Gene expression analysis RNA transcript 9 Cluster discovery 10 Gibbs sampling 11 Protein network analysis 12 Regulatory network inference 13 Emerging network properties

3 Course Outline Lec Date Topic Homework 1 Thu Sep 04
Intro: Biology, Algorithms, Machine Learning Recitation1: Probability, Statistics, Biology PS1 2 Tue Sep 09 Global/local alignment/DynProg (due 9/15) 3 Thu Sep 11 StringSearch/Blast/DB Search Recitation2: Affine gaps alignment / Hashing with combs 4 Tue Sep 16 Clustering/Basics/Gene Expression/Sequence Clustering PS2 5 Thu Sep 18 Classification/Feature Selection/ROC/SVM (due 9/22) Recitation4: Microarrays 6 Tue Sep 23 HMMs1 - Evaluation / Parsing 7 Thu Sep 25 HMMs2 - PosteriorDecoding/Learning Recitation3: Posterior decoding review, Baum-Welch Learning PS3 8 Tue Sep 30 Generalized HMMs and Gene Prediction (due 10/06) 9 Thu Oct 02 Regulatory Motifs/Gibbs Sampling/EM Recitation5: Entropy, Information, Background models 10 Tue Oct 07 Gene evolution: Phylogenetic algorithms, NJ, ML, Parsimony 11 Thu Oct 09 MolecularEvolution/Coalescence/Selection/MKS/KaKs Recitation 6: Gene Trees, Species Trees, Reconciliation PS4 12 Tue Oct 14 Population Genomics - Fundamentals (due 10/20) 13 Thu Oct 16 Population Genomics - Association Studies Recitation 7: Population genomics 14 Tue Oct 21 MIDTERM (Review Session Mon Oct 20rd at 7pm) Midterm 15 Thu Oct 23 GenomeAssembly/EulerGraphs Recitation 8: Brainstorming for Final Projects (MIT) Project 16 Tue Oct 28 ComparativeGenomics1:BiologicalSignalDiscovery/EvolutionarySignatures Phase I 17 Thu Oct 30 ComparativeGenomics2:Phylogenomics, Gene and Genome Duplication (due 11/06) 18 Tue Nov 04 Conditional Random Fields/Gene Finding/Feature Finding 19 Thu Nov 06 Regulatory Networks/Bayesian Networks Tue Nov 11 Veterans Day Holiday 20 Thu Nov 13 Inferring Biological Networks/Graph Isomorphism/Network Motifs Phase II 21 Tue Nov 18 Metabolic Modeling 1 - Dynamic systems modeling (due 11/24) 22 Thu Nov 20 Metabolic Modeling 2 - Flux balance analysis & metabolic control analysis 23 Tue Nov 25 Systems Biology (Uri Alon) Thu Nov 27 ThanksGiving Break 24 Tue Dec 02 Module Networks (Aviv Regev) 25 Thu Dec 04 Synthetic Biology (Tom Knight) Phase III 26 Tue Dec 09 Final Presentations - Part I (11am) (due 12/08) 27 Final Presentations - Part II (5pm) Course Outline

4 Reminder: Last lecture / recitation
Schedule for the term ‘Foundations’ till midterm ‘Frontiers’ lead to final project Duality: basic problems / fundamental techniques Biology introduction DNA, RNA, protein, transcription, translation Why computational biology Today: Comparative genomics is everywhere! Problem set 1: dating vertebrate whole-genome duplication Problem set 2: discover genes using their conservation properties Problem set 3: discover all motifs across entire yeast genome Problem set 4: reversing human/mouse genome rearrangements

5 Evolution preserved functional elements!
Gal4 Gal10 Gal1 GAL10 Scer TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACA Spar CTATGTTGATCTTTTCAGAATTTTT-CACTATATTAAGATGGGTGCAAAGAAGTGTGATTATTATATTACATCGCTTTCCTATCATACACA Smik GTATATTGAATTTTTCAGTTTTTTTTCACTATCTTCAAGGTTATGTAAAAAA-TGTCAAGATAATATTACATTTCGTTACTATCATACACA Sbay TTTTTTTGATTTCTTTAGTTTTCTTTCTTTAACTTCAAAATTATAAAAGAAAGTGTAGTCACATCATGCTATCT-GTCACTATCACATATA * * **** * * * ** ** * * ** ** ** * * * ** ** * * * ** * * * Scer TATCCATATCTAATCTTACTTATATGTTGT-GGAAAT-GTAAAGAGCCCCATTATCTTAGCCTAAAAAAACC--TTCTCTTTGGAACTTTCAGTAATACG Spar TATCCATATCTAGTCTTACTTATATGTTGT-GAGAGT-GTTGATAACCCCAGTATCTTAACCCAAGAAAGCC--TT-TCTATGAAACTTGAACTG-TACG Smik TACCGATGTCTAGTCTTACTTATATGTTAC-GGGAATTGTTGGTAATCCCAGTCTCCCAGATCAAAAAAGGT--CTTTCTATGGAGCTTTG-CTA-TATG Sbay TAGATATTTCTGATCTTTCTTATATATTATAGAGAGATGCCAATAAACGTGCTACCTCGAACAAAAGAAGGGGATTTTCTGTAGGGCTTTCCCTATTTTG ** ** *** **** ******* ** * * * * * * * ** ** * *** * *** * * * Scer CTTAACTGCTCATTGC-----TATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCT Spar CTAAACTGCTCATTGC-----AATATTGAAGTACGGATCAGAAGCCGCCGAGCGGACGACAGCCCTCCGACGGAATATTCCCCTCCGTGCGTCGCCGTCT Smik TTTAGCTGTTCAAG ATATTGAAATACGGATGAGAAGCCGCCGAACGGACGACAATTCCCCGACGGAACATTCTCCTCCGCGCGGCGTCCTCT Sbay TCTTATTGTCCATTACTTCGCAATGTTGAAATACGGATCAGAAGCTGCCGACCGGATGACAGTACTCCGGCGGAAAACTGTCCTCCGTGCGAAGTCGTCT ** ** ** ***** ******* ****** ***** *** **** * *** ***** * * ****** *** * *** Scer TCACCGG-TCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAA-----TACTAGCTTTT--ATGGTTATGAA Spar TCGTCGGGTTGTGTCCCTTAA-CATCGATGTACCTCGCGCCGCCCTGCTCCGAACAATAAGGATTCTACAAGAAA-TACTTGTTTTTTTATGGTTATGAC Smik ACGTTGG-TCGCGTCCCTGAA-CATAGGTACGGCTCGCACCACCGTGGTCCGAACTATAATACTGGCATAAAGAGGTACTAATTTCT--ACGGTGATGCC Sbay GTG-CGGATCACGTCCCTGAT-TACTGAAGCGTCTCGCCCCGCCATACCCCGAACAATGCAAATGCAAGAACAAA-TGCCTGTAGTG--GCAGTTATGGT ** * ** *** * * ***** ** * * ****** ** * * ** * * ** *** Scer GAGGA-AAAATTGGCAGTAA----CCTGGCCCCACAAACCTT-CAAATTAACGAATCAAATTAACAACCATA-GGATGATAATGCGA------TTAG--T Spar AGGAACAAAATAAGCAGCCC----ACTGACCCCATATACCTTTCAAACTATTGAATCAAATTGGCCAGCATA-TGGTAATAGTACAG------TTAG--G Smik CAACGCAAAATAAACAGTCC----CCCGGCCCCACATACCTT-CAAATCGATGCGTAAAACTGGCTAGCATA-GAATTTTGGTAGCAA-AATATTAG--G Sbay GAACGTGAAATGACAATTCCTTGCCCCT-CCCCAATATACTTTGTTCCGTGTACAGCACACTGGATAGAACAATGATGGGGTTGCGGTCAAGCCTACTCG **** * * ***** *** * * * * * * * * ** Scer TTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCG--ATGATTTTT-GATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCAC-----TT Spar GTTTT--TCTTATTCCTGAGACAATTCATCCGCAAAAAATAATGGTTTTT-GGTCTATTAGCAAACATATAAATGCAAAAGTTGCATAGCCAC-----TT Smik TTCTCA--CCTTTCTCTGTGATAATTCATCACCGAAATG--ATGGTTTA--GGACTATTAGCAAACATATAAATGCAAAAGTCGCAGAGATCA-----AT Sbay TTTTCCGTTTTACTTCTGTAGTGGCTCAT--GCAGAAAGTAATGGTTTTCTGTTCCTTTTGCAAACATATAAATATGAAAGTAAGATCGCCTCAATTGTA * * * *** * ** * * *** *** * * ** ** * ******** **** * Scer TAACTAATACTTTCAACATTTTCAGT--TTGTATTACTT-CTTATTCAAAT----GTCATAAAAGTATCAACA-AAAAATTGTTAATATACCTCTATACT Spar TAAATAC-ATTTGCTCCTCCAAGATT--TTTAATTTCGT-TTTGTTTTATT----GTCATGGAAATATTAACA-ACAAGTAGTTAATATACATCTATACT Smik TCATTCC-ATTCGAACCTTTGAGACTAATTATATTTAGTACTAGTTTTCTTTGGAGTTATAGAAATACCAAAA-AAAAATAGTCAGTATCTATACATACA Sbay TAGTTTTTCTTTATTCCGTTTGTACTTCTTAGATTTGTTATTTCCGGTTTTACTTTGTCTCCAATTATCAAAACATCAATAACAAGTATTCAACATTTGT * * * * * * ** *** * * * * ** ** ** * * * * * *** * Scer TTAA-CGTCAAGGA---GAAAAAACTATA Spar TTAT-CGTCAAGGAAA-GAACAAACTATA Smik TCGTTCATCAAGAA----AAAAAACTA.. Sbay TTATCCCAAAAAAACAACAACAACATATA * * ** * ** ** ** TBP GAL4 MIG1 Factor footprint Conservation island Classic Phylogenetic footprinting * bases bound by the transcription factors are the ones that are conserved * look at a systematic approach * Motifs are found in islands of conservation within oceans of intergenic background. * Orthologous motifs can be more strongly conserved, although paralogous can be degenerate (ex: 4th instance of GAL4 doesn’t actually follow motif, yet it is shown to bind based on footprinting data) * Islands often extend past the true motif. * Noisy data, short motifs separated by other elements. ===== We may not be able to distinguish the true motif from its surrounding in a single region, but if we look at all its instances in the genome, we are able to detect motif signatures. ========= Gal4 motif is split into multiple islands . However, if we specifically search for conserved short patterns within the region, finding 3 copies of the CGG-11-CCG becomes surprising We’ve seen the conservation or true motifs in known regions. What about true motifs throughout the genome? What can we learn? First of all, small 6bp patterns occur on average every 4kb by chance alone. They are found both in genes and in intergenic regions. Where are they conserved? Where else on the genome do they occur? We can ‘read’ evolution to reveal functional elements GAL1 Kellis et al, Nature 2003

6 Today’s goal: How do we actually align two genes?

7 Comparing Genomes Slide credit: Serafim Batzoglou

8 Genomes change over time
begin end A C G T C A T C A A C G T mutation A G T C deletion A G T C insertion

9 Goal of alignment: Infer edit operations
begin A C G T C A T C A ? end T A G T G T C A

10 From Bio to CS: Formalizing the problem
Define set of evolutionary operations (insertion, deletion, mutation) Symmetric operations allow time reversibility (part of design choice) Human Mouse x y x+y Human Mouse Many possible transformations Minimum cost transformation(s) Define optimality criterion (min number, min cost) Impossible to infer exact series of operations (Occam’s razor: find min) Design algorithm that achieves that optimality (or approximates it) Tractability of solution depends on assumptions in the formulation Bio CS Relevance Assumptions Special cases Tractability Tradeoffs Computability Algorithms Implementation Predictability Correctness Note: Not all decisions are conflicting (some are both relevant and tractable) (e.g. Pevzner vs. Sankoff and directionality in chromosomal inversions)

11 Formulation 1: Longest common substring
Given two possibly related strings S1 and S2 What is the longest common substring? (no gaps) S1 A C G T C A T C A S2 T A G T G T C A A C G T S1 S2 offset: +1 A C G T S1 S2 offset: -2

12 Formulation 2: Longest common subsequence
Given two possibly related strings S1 and S2 What is the longest common subsequence? (gaps allowed) S1 A C G T C A T C A S2 T A G T G T C A A C G T S1 S2 A G T C LCSS Edit distance: Number of changes needed for S1S2 Uniform scoring function

13 Formulation 3: Sequence alignment
Allow gaps (fixed penalty) Insertion & deletion operations Unit cost for each character inserted or deleted Varying penalties for edit operations Transitions (PyrimidinePyrimidine, PurinePurine) Transversions (Purine  Pyrimidine changes) Polymerase confuses Aw/G and Cw/T more often Scoring function: Match(x,x) = +1 Mismatch(A,G)= -½ Mismatch(C,T)= -½ Mismatch(x,y) = -1 A G T C +1 -1 Transitions: AG, CT common (lower penalty) Transversions: All other operations purine pyrimid.

14 Etc… (e.g. varying gap penalties)

15 How can we compute best alignment
Given additive scoring function: Cost of mutation (AG, CT, other) Cost of insertion / deletion Reward of match Need algorithm for inferring best alignment Enumeration? How would you do it? How many alignments are there?

16 Can we simply enumerate all possible alignments?
Ways to align two sequences of length m, n For two sequences of length n

17 Key insight: score is additive!
j Compute best alignment recursively For a given aligned pair (i, j), the best alignment is: Best alignment of S1[1..i] and S2[1..j] + Best alignment of S1[ i..n] and S2[ j..m] Proof: cut-and-paste argument (see 6.046) i i S1 A C G T C A T C A S2 T A G T G T C A j j

18 Key insight: re-use computation
Identical sub-problems! We can reuse our work!

19 Solution #1 – Memoization
Create a big dictionary, indexed by aligned seqs When you encounter a new pair of sequences If it is in the dictionary: Look up the solution If it is not in the dictionary Compute the solution Insert the solution in the dictionary Ensures that there is no duplicated work Only need to compute each sub-alignment once! Top down approach

20 Solution #2 – Dynamic programming
Create a big table, indexed by (i,j) Fill it in from the beginning all the way till the end You know that you’ll need every subpart Guaranteed to explore entire search space Ensures that there is no duplicated work Only need to compute each sub-alignment once! Very simple computationally! Bottom up approach

21 A simple introduction to Dynamic Programming
55 Fibonacci numbers 8 13 2 5 3 34 21

22 Fibonacci numbers are ubiquitous in nature
Rabbits per generation Leaves per height Romanesque spirals Nautilus size Coneflower spirals Leaf ordering

23 Computing Fibonacci numbers: Top down
Fibonacci numbers are defined recursively: Python code def fibonacci(n): if n==1 or n==2: return 1 return fibonacci(n-1) + fibonacci(n-2) Goal: Compute nth Fibonacci number. F(0)=1, F(1)=1, F(n)=F(n-1)+F(n-2) 1,1,2,3,5,8,13,21,34,55,89,144,233,377,… Analysis: T(n) = T(n-1) + T(n-2) = (…) = O(2n)

24 Computing Fibonacci numbers: Bottom up
Top-down approach Python code Analysis: T(n) = O(n) ? F[12] 89 F[11] 55 F[10] 34 F[9] 21 F[8] 13 F[7] 8 F[6] 5 F[5] 3 F[4] 2 F[3] 1 F[2] F[1] fib_table def fibonacci(n): fib_table[1] = 1 fib_table[2] = 1 for i in range(3,n+1): fib_table[i] = fib_table[i-1]+fib_table[i-2] return fib_table[n]

25 Lessons from iterative Fibonacci algorithm
What did the iterative solution do? Reveal identical sub-problems Order computation to enable result reuse Systematically filled-in table of resluts Expressed larger problems from their subparts Ordering of computations matters Naïve top-down approach very slow results of smaller problems not available repeated work Systematic bottom-up approach successful Systematically solve each sub-problem Fill-in table of sub-problem results in order. Look up solutions instead of recomputing ? F[12] 89 F[11] 55 F[10] 34 F[9] 21 F[8] 13 F[7] 8 F[6] 5 F[5] 3 F[4] 2 F[3] 1 F[2] F[1] fib_table

26 Dynamic Programming in Theory
Hallmarks of Dynamic Programming Optimal substructure: Optimal solution to problem (instance) contains optimal solutions to sub-problems Overlapping subproblems: Limited number of distinct subproblems, repeated many many times Typically for optimization problems (unlike Fib example) Optimal choice made locally: max( subsolution score) Score is typically added through the search space Traceback common, find optimal path from indiv. choices Middle of the road in range of difficulty Easier: greedy choice possible at each step DynProg: requires a traceback to find that optimal path Harder: no opt. substr., e.g. subproblem dependencies

27 Hallmarks of optimization problems
Greedy algorithms Dynamic Programming 1. Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. 2. Overlapping subproblems A recursive solution contains a “small” number of distinct subproblems repeated many times. 3. Greedy choice property Locally optimal choices lead to globally optimal solution Greedy Choice is not possible Globally optimal solution requires trace back through many choices

28 Dynamic Programming in Practice
Setting up dynamic programming Find ‘matrix’ parameterization (# dimensions, variables) Make sure sub-problem space is finite! (not exponential) If not all subproblems are used, better off using memoization If reuse not extensive, perhaps DynProg is not right solution! Traversal order: sub-results ready when you need them Computation order matters! (bottom-up, but not always obvious) Recursion formula: larger problems = F(subparts) Remember choices: typically F() includes min() or max() Need representation for storing pointers, is this polynomial ! Then start computing Systematically fill in table of results, find optimal score Trace-back from optimal score, find optimal solution

29 How do we apply dynamic programming to sequence alignment ?

30 Key insight: score is additive!
j Compute best alignment recursively For a given aligned pair (i, j), the best alignment is: Best alignment of S1[1..i] and S2[1..j] + Best alignment of S1[ i..n] and S2[ j..m] i i S1 A C G T C A T C A S2 T A G T G T C A j j

31 Dynamic Programming for sequence alignment
Setting up dynamic programming Find ‘matrix’ parameterization Make sure sub-problem space is finite! (not exponential) Traversal order: sub-results ready when you need them Recursion formula: larger problems = F(subparts) Remember choices: typically F() includes min() or max() Then start computing Systematically fill in table of results, find optimal score Trace-back from optimal score, find optimal solution

32 (1, 2, 3) Store score of aligning (i,j) in matrix M(i,j)
S[1..i] i S[i..n] T[1..j] j S T[ j..m] Best alignment  Best path through the matrix

33 Duality: seq. alignment  path through the matrix
C G T C A T C A T A G T G T C A S1 A C G T C A T C A S2 T A A Goal: Find best path through the matrix G G T T G C/G T T C C A A

34 (4) Filling in the dynamic programming matrix
Local update rules: Compute next alignment based on previous alignment Just like Fibonacci numbers: F[i] = F[i-1] + F[i-2] Table lookup! Compute scores for prefixes of increasing length This allows a single recursion (top-left to bottom-right) instead of two recursions (middle-to-outside top-down) Only three possibilities for extending by one nucleotide: a gap in one species, a gap in the other, a (mis)match When you reach bottom right, prefix of length n is seq S Computing the score of a cell from its neighbors F( i-1, j ) - gap F(i,j) = max{ F( i , j ) + score } F( i , j-1) - gap

35 0. Setting up the scoring matrix
- A G T C Initialization: Top left: 0 Update Rule: A(i,j)=max{ } Termination: Bottom right

36 -2 -4 -6 -8 1. Allowing gaps in s Initialization: Top left: 0
C Initialization: Top left: 0 Update Rule: A(i,j)=max{ A(i-1 , j ) - 2 } Termination: Bottom right -2 -4 -6 -8

37 -2 -4 -6 -8 -10 -12 -14 2. Allowing gaps in t Initialization:
C Initialization: Top left: 0 Update Rule: A(i,j)=max{ A(i-1 , j ) - 2 A( i , j-1) - 2 } Termination: Bottom right -2 -4 -6 -8 -10 -12 -14

38 -2 -4 -6 -1 -3 -5 -8 -7 3. Allowing mismatches Initialization:
Top left: 0 Update Rule: A(i,j)=max{ A(i-1 , j ) - 2 A( i , j-1) - 2 A(i-1 , j-1) -1 } Termination: Bottom right -2 -4 -6 -1 -3 -5 -8 -7 -1 -1 -1 -1

39 4. Choosing optimal paths
- A G T C Initialization: Top left: 0 Update Rule: A(i,j)=max{ A(i-1 , j ) - 2 A( i , j-1) - 2 A(i-1 , j-1) -1 } Termination: Bottom right -2 -4 -6 -1 -3 -5 -8 -7 -1 -1 -1 -1

40 -2 -4 -6 1 -1 -3 -8 -5 5. Rewarding matches Initialization:
Top left: 0 Update Rule: A(i,j)=max{ A(i-1 , j ) - 2 A( i , j-1) - 2 A(i-1 , j-1) ±1 } Termination: Bottom right -2 -4 -6 1 -1 -3 -8 -5 1 1 -1 -1 1 -1 -1

41 What is missing? (5) Returning the actual path!
We know how to compute the best score Simply the number at the bottom right entry But we need to remember where it came from Pointer to the choice we made at each step Retrace path through the matrix Need to remember all the pointers x1 ………………………… xM y1 ………………………… yN Time needed: O(m*n) Space needed: O(m*n)

42 Summary Dynamic programming Sequence alignment
Reuse of computation Order sub-problems. Fill table of sub-problem results Read table instead of repeating work (ex: Fibonacci) Sequence alignment Edit distance and scoring functions Dynamic programming matrix Matrix traversal path  Optimal alignment Thursday: Variations on sequence alignment Local/global alignment, affine gaps, algo speed-ups Semi-numerical alignment, hashing, database lookup Recitation: Dynamic programming applications Probabilistic derivations of alignment scores

43

44 Bounded Dynamic Programming
Initialization: F(i,0), F(0,j) undefined for i, j > k Iteration: For i = 1…M For j = max(1, i – k)…min(N, i+k) F(i – 1, j – 1)+ s(xi, yj) F(i, j) = max F(i, j – 1) – d, if j > i – k(N) F(i – 1, j) – d, if j < i + k(N) Termination: same x1 ………………………… xM y1 ………………………… yN k(N) Slides credit: Serafim Batzoglou

45 Linear space alignment
It is easy to compute F(M, N) in linear space Allocate ( column[1] ) Allocate ( column[2] ) For i = 1….M If i > 1, then: Free( column[i – 2] ) Allocate( column[ i ] ) For j = 1…N F(i, j) = … F(i,j) What about the pointers?

46 Finding the best back-pointer for current column
Now, using 2 columns of space, we can compute for k = 1…M, F(M/2, k), Fr(M/2, N-k) PLUS the backpointers

47 Best forward-pointer for current column
Now, we can find k* maximizing F(M/2, k) + Fr(M/2, N-k) Also, we can trace the path exiting column M/2 from k* k* k*

48 Recursively find midpoint for left & right
Iterate this procedure to the left and right! k* N-k* M/2 M/2

49 Total time cost of linear-space alignment
k* N-k* M/2 M/2 Total Time: cMN + cMN/2 + cMN/4 + ….. = 2cMN = O(MN) Total Space: O(N) for computation, O(N+M) to store the optimal alignment

50 Formulation 4: Varying gap cost models (next time)
(still) Varying penalties for edit operations Now allow gaps of varying penalty: Linear gap penalty Same as before, Affine gap penalty Big initial cost for starting or ending a gap Small incremental cost for each additional character General gap penalty Any cost function No longer computable using the same model Seek duplicated regions, rearrangements, …


Download ppt "Sequence Alignment and Dynamic Programming"

Similar presentations


Ads by Google