TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Heuristic alignment algorithms and cost matrices
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Developing Pairwise Sequence Alignment Algorithms
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Heuristic Approaches for Sequence Alignments
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
Developing Pairwise Sequence Alignment Algorithms
Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise & Multiple sequence alignments
Protein Sequence Alignment and Database Searching.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Pairwise sequence alignment Urmila Kulkarni-Kale Bioinformatics Centre, University of Pune, Pune
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Step 3: Tools Database Searching
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence comparison: Local alignment
Sequence Alignment 11/24/2018.
Pairwise sequence Alignment.
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
A T C.
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway, Ireland

TM Database Homology Searching Use algorithms to increase efficiency and to provide a mathematical basis for searches which can be translated into statistical significance Assumes that sequence, structure and function are inter-related BLAST (Basic Local Alignment Search Tool) and FastA (Fast Alignment) –heuristic approximations of Needleman-Wunsch and Smith- Waterman algorithms –reduce computation

TM Needleman-Wunsch Algorithm General algorithm for sequence comparison Maximise a similarity score, to give ‘maximum match’ Maximum match = largest number of residues of one sequence that can be matched with another allowing for all possible deletions Finds the best GLOBAL alignment of any two sequences N-W involves an iterative matrix method of calculation –All possible pairs of residues (bases or amino acids) - one from each sequence - are represented in a 2-dimensional array –All possible alignments (comparisons) are represented by pathways through this array

TM Needleman-Wunsch Algorithm (cont.) Three main steps 1. Assign similarity values 2. For each cell, look at all possible pathways back to the beginning of the sequence (allowing insertions and deletions) and give that cell the value of the maximum scoring pathway 3. Construct an alignment (pathway) back from the highest scoring cell to give the highest scoring alignment

TM Needleman-Wunsch Algorithm (cont.) Similarity values A numerical value is assigned to every cell in the array depending on the similarity/dissimilarity of the two residues These may be simple scores or more complicated, e.g. related to chemical similarities or frequency of observed substitutions The example shown has –match = +1 –mismatch = 0

TM Needleman-Wunsch Algorithm (cont.) Score pathways through array For each cell want to know the maximum possible score for an alignment ending at that point Searches subrow and subcolumn, as shown, for the highest score Adds this to the score for the current cell Proceeds row by row through the array Gap penalty for the introduction of gaps in the alignment (presumed insertions or deletions into one sequence) … here = 0 H ij =max{H i-1, j-1 +s(a i,b j ), max{H i-k,j-1 -W k +s(a i,b j )}, max{H i-1, j-l -W l +s(a i,b j )}}

TM Needleman-Wunsch Algorithm (cont.) Construct alignment The alignment score is cumulative by adding along a path through the array The best alignment has the highest score i.e. the maximum match Maximum match = largest number resulting from summing the cell values of every pathway The maximum match will ALWAYS be somewhere in the outer row or column shown The alignment is constructed by working backwards from the maximum match MP-RCLCQR-JNCBA | || | | | | | -PBRCKC-RNJ-CJA

TM Needleman-Wunsch Algorithm (cont.) Statistical Significance Maximum match is a function of sequence relationship and composition Would like to know probability of obtaining result (maximum match) from a pair of random sequences Estimate this experimentally –form pairs of random sequences by randomly drawing one member from each set (I.e. have same composition as the real proteins) –if the value found for the real proteins is significantly different from that for the random proteins then the difference is a function of the sequences alone and not of their composition

TM Smith-Waterman Algorithm Instead of looking at each sequence in its entirety this compares segments of all possible lengths (LOCAL alignments) and chooses whichever maximise the similarity measure For every cell the algorithm calculates ALL possible paths leading to it. These paths can be of any length and can contain insertions and deletions

TM Smith-Waterman Algorithm (cont.) Only works effectively when gap penalties are used In example shown –match = +1 –mismatch = -1/3 –gap = -1+1/3k (k=extent of gap) Start with all cell values = 0 Looks in subcolumn and subrow shown and in direct diagonal for a score that is the highest when you take alignment score or gap penalty into account H ij =max{H i-1, j-1 +s(a i,b j ), max{H i-k,j -W k }, max{H i, j-l -W l }, 0}

TM Smith-Waterman Algorithm (cont.) Four possible ways of forming a path For every residue in the query sequence 1.Align with next residue of db sequence … score is previous score plus similarity score for the two residues 2. Deletion (i.e. match residue of query with a gap) … score is previous score minus gap penalty dependent on size of gap 3.Insertion (i.e. match residue of db sequence with a gap) … score is previous score minus gap penalty dependent on size of gap 4.Stop … score is zero Choose whichever of these is the highest

TM Smith-Waterman Algorithm (cont.) Construct Alignment The score in each cell is the maximum possible score for an alignment of ANY LENGTH ending at those coordinates Trace pathway back from highest scoring cell This cell can be anywhere in the array Align highest scoring segment GCC-UCG GCCAUUG

TM Differences Needleman-Wunsch 1. Global alignments 2. Requires alignment score for a pair of residues to be >=0 3. No gap penalty required 4. Score cannot decrease between two cells of a pathway Smith-Waterman 1. Local alignments 2. Residue alignment score may be positive or negative 3. Requires a gap penalty to work effectively 4. Score can increase, decrease or stay level between two cells of a pathway