Graphical comparison of sequences using “Dotplots”. ACCTGCCCTGTCCAGCTTACATGCATGCTTATAGGGGCATTTTACAT ACCTGCCGATTCCATATTACGCATGCTTCTGGGTTACCGTTCAGGGCATTTTACATGTGCTG.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Computational Biology, Part 7 Similarity Functions and Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
BLAST Sequence alignment, E-value & Extreme value distribution.
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
DNA sequences alignment measurement
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Similarity Searching Class 4 March 2010.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Heuristic Approaches for Sequence Alignments
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
1 Lesson 3 Aligning sequences and searching databases.
Sequence alignment, E-value & Extreme value distribution
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)
BLAST Workshop Maya Schushan June 2009.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Doug Raiford Phage class: introduction to sequence databases.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Pairwise Alignment Sándor Pongor
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Lecture #7: FASTA & LFASTA
Graphical comparison of sequences using “Dotplots”.
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Graphical comparison of sequences using “Dotplots”. ACCTGCCCTGTCCAGCTTACATGCATGCTTATAGGGGCATTTTACAT ACCTGCCGATTCCATATTACGCATGCTTCTGGGTTACCGTTCAGGGCATTTTACATGTGCTG =9 Basic Principles. A T G C A T G C A “word size” (11 say) A “Scoring scheme” (1 for a match, 0 for a mismatch, say) A “Cut-off score” (8 say)  ATGCTTCTGGG ATGCTTATAGG Diagonal runs of dots indicate similar regions Summary: Dotplots provide a comprehensive overview but NO detail.

Graphical comparison of sequences using “Dotplots”. DNA: Simplest Scheme is the Identity Matrix. A T G C A T G C More complex matrices can be used. For example, the default EMBOSS DNA scoring matrix is: A T G C A T G C The use of negative numbers is only pertinent when these matrices are use for computing textual alignments. Using a wider spread of scores eases the Expansion of the scoring matrix to sensibly include ambiguity codes. Scoring Schemes.

Graphical comparison of sequences using “Dotplots”. A C G T S W R Y K M B V H D N U A C G T S W R Y K M B V H D N U IUB DNA Alphabet Code Meaning A C G T/U M `aMino` A|C R `puRine` A|G W `Weak` A|T S `Strong` C|G Y `pYrimidine` C|T K `Keto` G|T V `not T` A|C|G H `not G` A|C|T D `not C` A|G|T B `not A` C|G|T N `aNy` A|C|G|T For Protein sequence dotplots more complex scoring schemes are required. Scores must reflect far more than alphabetic identity. A B C D E F G H I K L M N P Q R S T V W Y Z A B C D E F G H I K L M N P Q R S T V W Y Z Using a wider spread of scores eases the expansion of the scoring matrix to sensibly include ambiguity codes. Scoring Schemes.

If the maximum possible cut-off score (still 11) is not achievedOnly if the maximum possible cut-off score (11) is achieved Graphical comparison of sequences using “Dotplots”. To detect perfectly matching words, a dotplot program has a choice of strategies Select a scoring scheme A T G C A T G C ATGCTTATAGG ATGCTTCTGGG =11 1) For every pair of words, compute a word match score in the normal way and a word size (11, say)  ATGCTTATAGG ATGCTTCTGGG =9 Celebrate with a dotDo not celebrate with a dot   Faster plots for perfect matches.

Graphical comparison of sequences using “Dotplots”. To detect perfectly matching words, a dotplot program has a choice of strategies 2) OR If they are notIf they are ATGCTTATAGG ATGCTTCTGGG  For every pair of words, ……… see if the letters are exactly the same  ATGCTTATAGG ATGCTTCTGGG Celebrate with a dotDo not celebrate with a dot    To detect exactly matching words, fast character string matching can replace laborious computation of match scores to be compared with a cut-off score Many packages include a dotplot option specifically for detecting exactly matching words. Particular advantage when seeking strong matches in long DNA sequences. Faster plots for perfect matches.

Graphical comparison of sequences using “Dotplots”. There are three parameters to consider for a dotplot: 1)The scoring scheme. 2)The cut-off score 3)The word size Dotplot parameters.

Graphical comparison of sequences using “Dotplots”. The Scoring scheme. DNA Usually, DNA Scoring schemes award a fixed reward for each matched pair of bases and a fixed penalty for each mismatched pair of bases. Choosing between such scoring schemes will affect only the choice of a sensible cut-off score and the way ambiguity codes are treated. Protein Protein scoring schemes differ in the evolution distance assumed between the proteins being compared. The choice is rarely crucial for dotplot programs. Dotplot parameters.

Graphical comparison of sequences using “Dotplots”. The Cut-off score. The lower the cut-off score the more dots will be plotted. But, dots are more likely to indicate a chance match (noise). The higher the cut-off score the less dots will be plotted. But, each dot is more likely to be significant. Dotplot parameters.

Scoring Scheme: PAM 250, Word Size: 25, Cut-off score: Graphical comparison of sequences using “Dotplots”. The Cut-off score. 4 clear strong regions apparent 4 regions become clearer, some other weaker features appear More “features”, probably noise, appear obscuring the original 4 clear regions. Cut-off now clearly too low. Too much noise to see interesting regions. Dotplot parameters.

Graphical comparison of sequences using “Dotplots”. The Word size. Large words can miss small matches.Smaller words pick up smaller features. The smallest “features” are often just “noise”. Dotplot parameters.

Graphical comparison of sequences using “Dotplots”. The Word size. For sequences with regions of small matching features. Small words pick small features Individually. Larger words show matching regions more clearly. The lack of detail can be an advantage Dotplot parameters.

Displaying the word 11 plot alone shows that major features are drawn in more “carefully”. Arguably, less usefully if a broad overview is the objective. Graphical comparison of sequences using “Dotplots”. Using a relatively large word size of 25, features are drawn with a broad brush. Detail can be missed Superimposing a plot with a smaller word size of 11 shows the emergence of extra dots. In this case probably all noise. The Word size. Dotplot parameters.

Graphical comparison of sequences using “Dotplots”. Detection of Repeats Other uses of dotplots.

Graphical comparison of sequences using “Dotplots”. Detection of Repeats Other uses of dotplots.

Graphical comparison of sequences using “Dotplots”. Detection of Repeats Other uses of dotplots.

Graphical comparison of sequences using “Dotplots”. Detection of Repeats Other uses of dotplots.

Graphical comparison of sequences using “Dotplots”. Other uses of dotplots. Detection of Stem Loops

Graphical comparison of sequences using “Dotplots”. Other uses of dotplots. Detection of Stem Loops

The End.