Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Similarity Searching Class 4 March 2010.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Alignment methods April 12, 2005 Return Homework (Ave. = 7.5)
Course Summary June 2, 2005 Programming Workshop Overview of course (presentation) Protein modeling, part 2 Instructor evaluations.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Protein Modules An Introduction to Bioinformatics.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
1. Primary Structure: Polypeptide chain Polypeptide chain Amino acid monomers Peptide linkages Figure 3.6 The Four Levels of Protein Structure.
Similar Sequence Similar Function Charles Yan Spring 2006.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Structures.
Basics of Sequence Alignment and Weight Matrices and DOT Plot
Protein Tertiary Structure Prediction
Thursday and Friday Dr Michael Carton Formerly VO’F group, now National Disease Surveillance Centre (NDSC) Wed (tomorrow) 10am - this suite booked for.
Biomolecules: Nucleic Acids and Proteins
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Sequence Alignment and Database Searching.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Protein Secondary Structure, Bioinformatics Tools, and Multiple Sequence Alignments Finding Similar Sequences Predicting Secondary Structures Predicting.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Sequence Alignment.
Doug Raiford Phage class: introduction to sequence databases.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Protein Structure  The structure of proteins can be described at 4 levels – primary, secondary, tertiary and quaternary.  Primary structure  The sequence.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
Protein backbone Biochemical view:
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
1 4. Nucleic acids and proteins in one and more dimensions - second part.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Bioinformatics Overview
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
Amino Acids and Proteins
Comparison of Exemplars of Rotamer Clusters Across the Proteinogenic Amino Acids
Proteins.
Dot Plots Dot Plots provide a graphic view of the amount of similarity between two sequences. The two axes represent the two sequences. In its simplest.
BLAST.
Protein Structures.
Introduction and Fundamentals of Protein Structure
Introduction and Fundamentals of Protein Structure
Nucleic Acids Structure Cellular Function.
Four Levels of Protein Structure
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular. Workshop-Perform sliding window to compute %G+C as a function of position in sequence. Compute hydrophobicity as a function of position in sequence. Become familiar with the Dotter program.

Sliding window A sliding window-gathers information about properties of nucleotides or amino acids. GCATATGCGCATATCCCGTCAATACCA A simple example is to calculate the %G+C content within a window. Then move the window one nucleotide and repeat the calculation.

Sliding window If the window is too small it is difficult to detect the trend of the measurement. If too large you could miss meaningful data. Large window size Small window size %G+C Sequence number

Sliding window Adapted from Zhao et al, BMC Genomics Nov 7;8:403.

Amino acid characteristics

Four levels of protein structure 1) Primary 2) Secondary 3) Tertiary 4) Quaternary Linear sequence- AGHIPLLQ Initial folding patterns- AGHIPLLQ  TTT  Complex folding patterns- Interactions between polypeptides

Kyte-Doolittle Hydropathy Plot – Another sliding window routine [J. Mol. Biol. 157: (1982)]. They determine a "hydropathy scale" for each amino acid based on chemical properties

Dot Plot with window = 1 A T G C C T A G ATGCCTAGATGCCTAG * * * * * * * * * * * * * * * * Window = 1 Note that 25% of the table will be filled due to random chance. 1 in 4 chance at each position

Dot Plot with window = 3 A T G C C T A G ATGCCTAGATGCCTAG * * * * * * Window = 3 The larger the window the more noise can be filtered What is the percent chance that you will receive a match randomly? One in (four) 3 chance. (¼) 3 * 100 = 1.56% {

Evolutionary Basis of Sequence Alignment 1. Identity: Quantity that describes how much two sequences are alike in the strictest terms. 2. Similarity: Quantity that relates how much two amino acid sequences are alike. 3. Homology: a conclusion drawn from data suggesting that two genes share a common evolutionary history.

Purpose of finding differences and similarities of amino acids in two proteins. Infer structural information Infer functional information Infer evolutionary relationships

One is mouse trypsin and the other is crayfish trypsin. They are homologous proteins. The sequences share 41% identity.

Modular nature of proteins Proteins possess local regions of similarity. Proteins can be thought of as assemblies of modular domains.

Identity Matrix Simplest type of scoring matrix LICA 1000L 100I 10C 1A

Similarity It is easy to score if an amino acid is identical to another (the score is 1 if identical and 0 if not). However, it is not easy to give a score for amino acids that are somewhat similar. + NH 3 CO NH 3 CO 2 - Leucine Isoleucine Should they get a 0 (non-identical) or a 1 (identical) or Something in between?

Two proteins that are similar in certain regions (domains) Tissue plasminogen activator (PLAT) Coagulation factor 12 (F12).

The Dotter Program Program consists of three components: Sliding window A table that gives a score for each amino acid match A graph that converts the score to a dot of certain density (the higher the dot density the higher the score)

Region of similarity Single region on F12 is similar to two regions on PLAT