HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.

Slides:



Advertisements
Similar presentations
FA08CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Measuring the degree of similarity: PAM and blosum Matrix
Basics of Comparative Genomics Dr G. P. S. Raghava.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Bioinformatics and Phylogenetic Analysis
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Sequence similarity.
Identifying Functional signatures in Proteins - a computational design approach David Bernick Rohl group 16-Mar-2005.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Sequence alignment, E-value & Extreme value distribution
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Protein Sequence Alignment and Database Searching.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Sequence Alignments and Database Searching 08/20/07.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Sequence Alignment.
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Step 3: Tools Database Searching
August 26, 2011 Biochemistry 201 David Worthylake, 7152 MEB, x5176 Sequence Alignments and Database Searching.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence similarity, BLAST alignments & multiple sequence alignments
Blast Basic Local Alignment Search Tool
Basics of Comparative Genomics
Identifying templates for protein modeling:
Genome Annotation Continued
Sequence Alignments and Database Searching
Sequence Based Analysis Tutorial
What do you with a whole genome sequence?
Pairwise Sequence Alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Sequence Analysis Alan Christoffels
Presentation transcript:

HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they are likely to be Homologs. However, sometimes true homologs have quite low sequence identity! OrthologsHomologous (and equivalent) proteins from different species. Arise from speciation. ParalogsHomologous (and equivalent) proteins found in same species. Divergence of sequences NOT from speciation. AlignmentsHow to score? Minimum # of mutations?, Physicochemical properties (as perceived by us)?, Or learn from nature? Scoring schemesPAM, BLOSUM Gap penalties, low sequence complexity filtering

E valuesWhat it means in words E = Kmne -λS Alignment algorithmsBLAST (Basic Local Alignment Search Tool) FASTA (Fast Alignment) Smith-Waterman Needleman-Wunsch Why use local alignment algorithm?

IF A is related to B and B is related to C, then A is related to C (use orthologs, paralogs, known homologs PSI BLAST 1 normal BLAST run then subsequent iterations modify the scoring matrix – starts “rewarding” conservation at key positions in the alignment 3D-PSSMDatabase of structures grouped into fold families (homologs) Careful 1D PSSM is created for each fold family. The 1D PSSM is augmented by info from structural alignment of all family members(3D-PSSM). Also, entire library is used to assign solvation potentials (how likely is a glutamate to be only 5% exposed? 10% exposed. Etc.) Query sequence Get 1D-PSSM from nr database. Do 2ndry structure prediction. Now compare the query sequence to each library entry in 3 ways: Use the library Entry’s 1D-PSSM(augmented), see how the query compares to the 3D-PSSM of The library entry. See how the library entry compares to the 1D-PSSM for the query.

Hidden Markhov Models All transitions from one geometric shape to another are governed by probabilities that have been calculated (learned) using sequence alignments of proteins that define the fold. This mathematical engine (small one depicted above) can generate other sequences that obey the “rules” that define the fold (of a given family). The engine may have to be run MANY times before it spews out a sequence identical to one actually used to define the fold. But it can be used in a reverse way to estimate the likelihood that a query sequence could have been generated by the engine. If it is very likely, the query sequence has the same fold!! What’s hidden? One answer is “The original model” – The FIRST ancestral protein (e.g. The original PH domain) that set the mold for all others.