Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Measuring the degree of similarity: PAM and blosum Matrix
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Introduction to bioinformatics
Sequence Analysis Tools
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Introduction to Bioinformatics Algorithms Sequence Alignment.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Pairwise alignment Computational Genomics and Proteomics.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Sequencing a genome and Basic Sequence Alignment
Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequencing a genome and Basic Sequence Alignment
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence comparison: Local alignment
Pairwise Sequence Alignment
Basic Local Alignment Search Tool
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Basic Local Alignment Search Tool
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments

Overview Introduction/review Reading alignment outputs Scoring (substitution) matrices More on alignment algorithms and dynamic programming Useful alignment algorithms Examples

Introduction Sequence alignment is a useful tool with many, diverse applications. Examples of sequence alignments: –Compare a new sequence against an established sequence from a database –In sequencing a new gene one usually sequences both strands and then aligns (reversing one of them, of course!). This ensures accuracy.

Examples of Sequence Alignments (cont.) –Compare the sequence homology to look for evolutionary relatedness. –To identify the sites of mutations –To find regions of overlapping sequence (cosmids or YACs for example) –To identify conserved functional domains in gene products –Others to be sure!

Understanding Alignment Outputs One sequence is placed above another and the aligned vertical pairs are compared (scored) Matching pairs are joined with a bar ( | ) to indicate identity. A colon ( : ) is used to identify similar but nonidentical pairs. – IUB ambiguity codes are used (e.g. N pairs with G, C, T or A). –Nonidentical amino acids with similar physical properties can also be reported as similar.

Example 330 CCTTNATTTCCTTTTTGACA 349 ||||:||| ||||||||||| 991 CCTTAATTCCCTTTTTGACA 972 Only 20 bases of each sequence aligned (a local alignment) The numbers at each end of the alignment corresponds to the nucleotide number in the original sequence. –There was a 329 nucleotide non-identical prefix in the top query sequence and a 971 non-identical prefix in the lower query sequence. –There may have been non-identical suffixes too, or the entered sequences may only have been 341 and 991 bases long, respectfully.

Example (cont.) 330 CCTTNATTTCCTTTTTGACA 349 ||||:||| ||||||||||| 991 CCTTAATTCCCTTTTTGACA 972 The lower sequence has been reversed (complement) There are two non-identical pairs –Nucleotides number 334 and 987 are paired by a colon (:). The nucleotide at this position on the upper strand is an N indicating that the sequencer was unable to determine the nucleotide identity. –The nucleotide pair between numbers 338 (top) and 983 (bottom) comprises a T and a C. These do not match and no line has been drawn between them. This may be the result of a point mutation, or a mistake in determining or entering the sequence.

Scoring Alignments Positive values are given for each identical match Smaller positive values are given for “conservative substitutions” Negative values are given for non-identical, non- conservative pairs Gaps are penalized Total score is the sum of the individual pair wise scores Longer alignments give higher scores than shorter ones

Gaps and Scoring Gaps may be caused by insertion in one sequence or deletion in the other (“indel” events). We don’t know which. Gaps in an alignment are indicated by a ‘-’ in one or both of the sequences Gaps are penalized in scoring an alignment in two ways –Origination penalty - the scoring penalty for creating a gap of any length (larger) –Length penalty - based on the length of the gap (smaller)

A Simple Example of Gap Scoring If scoring matrix says: Match = +1 Mismatch = 0 Gap origination penalty = -2 Gap length penalty = -1 (for each base) Calculate the scores for each alignment. Which alignment is best and why?

A Simple Example of Gap Scoring If scoring matrix says: Match = +1 Mismatch = 0 Gap origination penalty = -2 Gap length penalty = -1 (for each base) The third alignment is best. From an evolutionary standpoint only one genetic event (indel spanning 2 bases). Score = -3Score = -1Score = 1

Scoring Matrices: How values are assigned for each pair in an alignment DNA scoring matrices are fairly simple

Scoring Matrices: How values are assigned for each pair in an alignment Protein matrices are far more complex –There are 20 “letters” v. only 4 in DNA –Far greater opportunity for conservative substitutions –Some are based on “observed” substitutions –Others are based on chemical/physical properties of the amino acids –Others are based on the genetic code (how easily could a codon specifying one amino acid be changed to a codon specifying a different amino acid?)

Two Common Protein Scoring Matrices The Point Accepted Mutation (PAM) matrix –Based on observed substitution rates –Different variations are used based on assumptions of the length of time since the sequences diverged PAM-1 may be best for comparing two closely related sequences Pam-1000 may be best for comparing sequences with distant relationships PAM-250 is a suitable compromise

A PAM250 Scoring Matrix

Two Common Protein Scoring Matrices (cont.) BLOSUM matrices are also commonly used Constructed by analyzing substitution rates for sequences that cluster by phylogenetic analysis Also appended with numbers (but different meaning) –BLOSUM-62 is best for comparing sequences with approximately 62% similarity –BLOSUM-80 is best for comparing sequences with approximately 80% similarity

Alignment Algorithms and Dynamic Programming Computer trickery! –The straightforward approach is too intense –For 2 sequences of 95 and 100 nucleotides there are ~ 55 million possible alignments! (imagine a database search in this context!) Dynamic programming breaks the problem into a series of small steps and adds the results of these small steps to answer the problem

Dynamic Programming (cont.) When you run an alignment a dynamic programming matrix is formed with the two sequences on the sides. Scores for each pair are placed in the matrix. If the sequences match, you would start in the lower right corner and proceed diagonally to the upper left corner. AC--TCG ACAGTAG Alignment score = 2 Vertical arrows indicate internal gaps

Graphical Output: Dot plots and Path Graphs

Comparison Dot Plots –Have been popular –Reveal complex relationships involving multiple regions –Difficult to interpret as they (may) show many alignments –Hard to see gaps and visualize “best” alignment Path Diagrams –More simple to interpret –Show only one alignment (Some can show more) –Gaps appear as horizontal or vertical segments of the path line

Example 1 X Y Y 5’ 3’ X

Example 2 X Y Y 5’ 3’ X

Example 3 X Y Y 5’ 3’ X

Some Useful Alignment Programs BLAST 2 Sequences (NCBI) CLUSTALW (Biology Workbench) MAP (Multiple Alignment Program) at Baylor, TX Many others

A Nice BLAST 2 Sequences Example at: