HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.

Slides:

Advertisements

Similar presentations

FA08CSE182 CSE 182-L2:Blast & variants I Dynamic Programming

Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.

Sources Page & Holmes Vladimir Likic presentation: 20show.pdf

BLAST Sequence alignment, E-value & Extreme value distribution.

Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.

Measuring the degree of similarity: PAM and blosum Matrix

Basics of Comparative Genomics Dr G. P. S. Raghava.

Sequence Similarity Searching Class 4 March 2010.

Heuristic alignment algorithms and cost matrices

Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.

Bioinformatics and Phylogenetic Analysis

Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.

Sequence similarity.

Identifying Functional signatures in Proteins - a computational design approach David Bernick Rohl group 16-Mar-2005.

Similar Sequence Similar Function Charles Yan Spring 2006.

Sequence Alignment III CIS 667 February 10, 2004.

Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.

Multiple Sequence Alignments

Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.

Sequence alignment, E-value & Extreme value distribution

TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,

Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.

Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.

Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.

Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)

Protein Sequence Alignment and Database Searching.

Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.

Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.

Sequence analysis: Macromolecular motif recognition Sylvia Nagl.

Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.

Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.

Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.

Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.

Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.

BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.

Construction of Substitution Matrices

Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.

Applied Bioinformatics Week 3. Theory I Similarity Dot plot.

Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.

Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.

Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.

Sequence Alignments and Database Searching 08/20/07.

Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:

Sequence Alignment.

Construction of Substitution matrices

Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.

Step 3: Tools Database Searching

August 26, 2011 Biochemistry 201 David Worthylake, 7152 MEB, x5176 Sequence Alignments and Database Searching.

The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.

Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.

BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.

Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.

Sequence similarity, BLAST alignments & multiple sequence alignments

Blast Basic Local Alignment Search Tool

Basics of Comparative Genomics

Identifying templates for protein modeling:

Genome Annotation Continued

Sequence Alignments and Database Searching

Sequence Based Analysis Tutorial

What do you with a whole genome sequence?

Pairwise Sequence Alignment

Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri

Basics of Comparative Genomics

Basic Local Alignment Search Tool

Sequence alignment, E-value & Extreme value distribution

Sequence Analysis Alan Christoffels

Presentation transcript:

HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they are likely to be Homologs. However, sometimes true homologs have quite low sequence identity! OrthologsHomologous (and equivalent) proteins from different species. Arise from speciation. ParalogsHomologous (and equivalent) proteins found in same species. Divergence of sequences NOT from speciation. AlignmentsHow to score? Minimum # of mutations?, Physicochemical properties (as perceived by us)?, Or learn from nature? Scoring schemesPAM, BLOSUM Gap penalties, low sequence complexity filtering

E valuesWhat it means in words E = Kmne -λS Alignment algorithmsBLAST (Basic Local Alignment Search Tool) FASTA (Fast Alignment) Smith-Waterman Needleman-Wunsch Why use local alignment algorithm?

IF A is related to B and B is related to C, then A is related to C (use orthologs, paralogs, known homologs PSI BLAST 1 normal BLAST run then subsequent iterations modify the scoring matrix – starts “rewarding” conservation at key positions in the alignment 3D-PSSMDatabase of structures grouped into fold families (homologs) Careful 1D PSSM is created for each fold family. The 1D PSSM is augmented by info from structural alignment of all family members(3D-PSSM). Also, entire library is used to assign solvation potentials (how likely is a glutamate to be only 5% exposed? 10% exposed. Etc.) Query sequence Get 1D-PSSM from nr database. Do 2ndry structure prediction. Now compare the query sequence to each library entry in 3 ways: Use the library Entry’s 1D-PSSM(augmented), see how the query compares to the 3D-PSSM of The library entry. See how the library entry compares to the 1D-PSSM for the query.

Hidden Markhov Models All transitions from one geometric shape to another are governed by probabilities that have been calculated (learned) using sequence alignments of proteins that define the fold. This mathematical engine (small one depicted above) can generate other sequences that obey the “rules” that define the fold (of a given family). The engine may have to be run MANY times before it spews out a sequence identical to one actually used to define the fold. But it can be used in a reverse way to estimate the likelihood that a query sequence could have been generated by the engine. If it is very likely, the query sequence has the same fold!! What’s hidden? One answer is “The original model” – The FIRST ancestral protein (e.g. The original PH domain) that set the mold for all others.