Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Pairwise Sequence Alignment
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Sequence Alignment.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Sequence Analysis Tools
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise sequence alignment Urmila Kulkarni-Kale Bioinformatics Centre, University of Pune, Pune
8/31/07BCB 444/544 F07 ISU Dobbs #6 - More DP: Global vs Local Alignment1 BCB 444/544 Lecture 6 Try to Finish Dynamic Programming Global & Local Alignment.
8/31/07BCB 444/544 F07 ISU Dobbs #6 - Scoring Matrices & Alignment Stats1 BCB 444/544 Lecture 6 Finish Dynamic Programming Scoring Matrices Alignment Statistics.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Sequence Alignment.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Sequence comparison: Local alignment
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Presentation transcript:

Pairwise sequence alignment Lecture 02

Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural and functional analysis of newly determined sequences.  The most fundamental comparison process is sequence alignment.  Search for common character patterns  Pairwise sequence alignment is the basis of database similarity searching and multiple sequence alignment

Evolutionary Basis  DNA and proteins are products of evolution.  Linear sequences of the nucleotide bases and amino acids form the primary structure of the DNA and proteins.  They can be considered as molecular fossils that encode the history of million of years of evolution.  During the time period they undergo random changes  Selections  Mutations

Evolutionary Basis cont.…  But the traces of evolution may still exits that will allow to identify the common ancestry.  This is due to the residues that perform key functional structural roles tend to be preserved by natural selection. Others tend to mutate more frequently.  Therefore by sequence alignment, patterns of conversions and variation can be identified.  The degree of sequence alignment reveals the evolutionary relatedness of different sequences.  Variations between the sequences reflects the changes occurred during the evolution in the form of substitution, insertions and deletions,

Why sequence alignment?  It helps to characterize the functions of unknown sequences.  Significant similarity between two sequences indicates that they belongs to the same family.  It is the basis for prediction of structure and functions of uncharacterized sequences.  Also it provides inference for the relatedness (evolutionary relationship) of two sequences under study.  If they share significant similarity it reflects the fact that they must have derived from a common evolutionary origin.

Sequence Homology vs. Sequence Similarity  Sequence homology – is an inference or a conclusion about a common ancestral relationship drawn from sequence similarity comparison. (high degree of similarity)  Sequence similarity – is the percentage of aligned residues that are similar in physiochemical properties such as size, charge and etc..  Sequence similarity can be quantified using percentages; homology is a qualitative statement.  Ex: two sequences share 40% similarity.  The two sequences are either homologous or nonhomologous

Methods  Pairwise sequence alignment  The task of locating equivalent regions of two or more sequences to maximize their overall similarity.  Alignment strategies:  Global Alignment  Local Alignment

Global Alignment  It is assumed that the two sequences to be aligned are generally similar over their entire length.  Alignment is carried out from beginning to the end of both sequences.  Finds out the best possible alignment across the entire length between the two  More applicable for aligning two closely related sequences.  May not be able to generate optimal results

Local Alignment  Finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions.  More applicable for aligning more divergent sequences.  Can be used to search conserved patterns in DNA or protein.  The aligning sequences can be in different lengths.

Alignment Algorithms  Fundamentally similar for both global and local alignments  They only differ in the optimization strategy used in aligning similar residues.  Methods:  The dot matrix method  The dynamic programming method  The word method

Dynamic Programming Method  It is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences.  Works by first constructing a two-dimensional matrix whose axes are the two sequences to be compared.  The residue matching is according to a particular scoring matrix.  The scores are calculated one row at a time.  The scores are accumulated along the diagonal going from the upper left corner to the lower right corner.  The best score is given by the bottom right corner of the matrix.

Dynamic Programming Method  To find the optimal alignment, back track through the matrix in reverse order from the lower right hand corner of the matrix towards the origin of the matrix in the upper left hand corner.  The best matching path is the one that has the maximum total score.

Gap penalties  Gaps represents insertions and deletions in sequence alignment  There is no precise costs for introducing insertions and deletions  Linear score  γ(g)=-dg, where d is a fixed cost, g is the gap length  Longer the gap higher the penalty  Affine score  γ(g)=-(d +(g-1)e), d and e are fixed penalties and e<d.  d : gap open penalty  e : gap extension penalty

Global alignment (by Needleman & Wunsch)  Given two sequences u and v and a scoring matrix delta, find the alignment with the maximal score.  The number of possible alignments is very big.

Number of possible alignments CAATGAATTGATSequence 1Sequence 2 CAATGA_ _ATTGAT CAATGA ATTGAT CAATGA_ AT_TGAT Alignment 1Alignment 2Alignment 3 Can we find the highest scoring alignment by enumerating all possible alignments and picking the best?

Dynamic programming idea  Let x’ s length be n  Let y’ s length be m  construct an ( n +1)  ( m +1) matrix F  F ( j, i ) = score of the best alignment of x 1 …x i with y 1 …y j y x A A CAG A C score of best alignment of AAA to AG

DP for global alignment with linear gap penalty F(j,i) F(j-1,i-1)F(j-1,i) F(j,i-1) -d s(x i,y j ) DP recurrence relation Score of the best partial alignment between x 1..x i and y 1.. y j

DP algorithm sketch: global alignment  initialize first row and column of matrix  fill in rest of matrix from top to bottom, left to right  for each F ( i, j ), save pointer(s) to cell(s) that resulted in best score  F (m, n) holds the optimal alignment score; trace pointers back from F (m, n) to F (0, 0) to recover alignment

Global alignment example  suppose we choose the following scoring scheme:  d (penalty for aligning with a gap) = 2

Initializing matrix: global alignment with linear gap penalty

Global alignment example

DP comments  works for either DNA or protein sequences, although the substitution matrices used differ  finds an optimal alignment  the exact algorithm (and computational complexity) depends on gap penalty function

Local alignment  so far we have discussed global alignment, where we are looking for best match between sequences from one end to the other  often we want a local alignment, the best match between subsequences of x and y

Local alignment DP algorithm  original formulation: Smith & Waterman, Journal of Molecular Biology, 1981  interpretation of array values is somewhat different:  F ( i, j ) = score of the best alignment of a suffix of x[1…i ] and a suffix of y[1…j ]

Local alignment DP algorithm  the recurrence relation is slightly different than for global algorithm

Local alignment DP algorithm  Initialization:  F(0, j) = F(i, 0) = 0  traceback:  find maximum value of F(i, j); can be anywhere in matrix  stop when we get to a cell with value 0

Local alignment example AAGA T0 T0 A0 A0 G0 Match +1 Mismatch -1 Gap -2

Local alignment example AAGA T00000 T00000 A0 A0 G0

AAGA T00000 T00000 A01101 A0 G0

AAGA T00000 T00000 A01101 A01201 G0

AAGA T00000 T00000 A01101 A01201 G0001 3

Local alignment DP algorithm  No negative values in local alignment DP array  Optimal local alignment will never have a gap on either end  Local alignment: “Smith-Waterman”  Global alignment: “Needleman-Wunsch”