Sequence Based Analysis Tutorial

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Introduction to Bioinformatics
Heuristic alignment algorithms and cost matrices
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Introduction to bioinformatics
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Multiple sequence alignment
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Sequence Alignment.
Construction of Substitution matrices
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence similarity, BLAST alignments & multiple sequence alignments
Demo: Protein Information Resource
Sequence comparison: Local alignment
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Dot Plots, Path Matrices, Score Matrices
Bioinformatics Methods for Inheriting Structural and Functional annotations for Gene Sequences if a related sequence has a known function can you inherit.
PIR: Protein Information Resource
Protein Sequence Analysis - Overview -
Sequence Based Analysis Tutorial
BLAST.
Pairwise Sequence Alignment
Protein Sequence Analysis - Overview -
Basic Local Alignment Search Tool (BLAST)
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center

Retrieval, Sequence Search & Classification Methods Retrieve protein info by text / UID Sequence Similarity Search BLAST, FASTA, Dynamic Programming Family Classification Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Integrated Search and Classification System

Sequence Similarity Search (I) Based on Pair-Wise Comparisons Dynamic Programming Algorithms Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search

Sequence Similarity Search (II) Similarity Search Parameters Scoring Matrices – Based on Conserved Amino Acid Substitution Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 Gap Penalty Search Time Comparisons Smith-Waterman: 10 Min FASTA: 2 Min BLAST: 20 Sec 10

Feature Representation Features of Amino Acids: Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features Alternative Amino Acids: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

Substitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys) 10

BLAST BALST (Basic Local Alignment Search Tool) Extremely fast Robust Most frequently used It finds very short segment pairs (“seeds”) between the query and the database sequence These seeds are then extended in both directions until the maximum possible score for extensions of this particular seed is reached

BLAST Search From BLAST Search Interface Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Click to see SSearch alignment Click to see alignment

Blast Result & Pairwise Alignment BLAST Aligment

Classification What is classification? Why do we need protein classification? Different levels of classification Basis for functional protein classification How to classify a protein of unknown function?

Classification Databases C - x(2,4) - C - x(3) - [LIVMFYWC] - x(8) - H - x(3,5) - H The 2 C's and the 2 H's are zinc ligands Group proteins according to the presence of a common domain Protein motif Protein domain 3-D structure Whole-protein Group proteins according to common 3D structure Group proteins according to common domain architecture and length Protein Domain: Structurally compact, independently folded unit that forms a stable 3D-structure and shows a certain level of evolutionary conservation Protein motif: A set of conserved amino acid residues that are important for protein function and located within a certain distance from one another

Family Classification Methods Based on Other Classification Information Multiple Sequence Alignment (ClustalW) ProSite Pattern Search Profile Search Hidden Markov Models (HMMs) Domain (Pfam); Whole protein (PIRSF) Neural Networks

How do you build a tree? Pick sequences to align Align them Verify the alignment Keep the parts that are aligned correctly Build and evaluate a phylogenetic tree Integrated Analysis

Multiple Sequence Alignment ClustalW Progressive Pairwise Approach Base on Exhaustive Pairwise Alignments Neighbor Joining Joining Order Corresponding to a Tree Alignment Varies Dependent on Joining Order

Multiple Alignment and Tree From Text/Sequence Search Result or ClustalW Alignment Interface

Here is an example of two different functions easily separated on a phylogenetic tree. Each functional group is used to build an HMM.

Motif Patterns (Regular Expressions) Signature Patterns for Functional Motifs ProClass Motif Alignments

PIR Pattern Search From Text/Sequence Search Result or Pattern Search Interface One Query Sequence Against PROSITE Pattern Database One Query Pattern (PROSITE or User-Defined) Against Sequence DB

Pattern Search Result (I) One Query Sequence Against PROSITE Pattern Database

Pattern Search Result (II) One Query Pattern Against Sequence Database Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report 2 1 3 Sorting arrows Display the query pattern

Profile Method Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue. Profile Searching Summation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved Positions

1 PIRSF scan Shows PIRSF that the query belongs to Search One Query Protein Against all the Full-length and Domain HMM models for the fully curated PIRSFs by HAMMER The matched regions and statistics will be displayed. Statistical data for all domains Statistical data per domain Alignment with consensus sequence

Secondary Structure Features a Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an a Helix (Amphipathic) b Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

Proteins share the same fold suggesting homology 3D Structure Proteins share the same fold suggesting homology Gamma Crystallin C Beta B1 Crystallin

Creation and Curation of PIRSFs

Integrated Bioinformatics System for Function and Pathway Discovery Data Integration Associative Analysis

Analytical Pipeline Family Classification & Functional Analysis Query Sequence UniProt Top-Matched Superfamilies/Domains BLAST Search HMM Domain Search Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs SSEARCH CLUSTALW Superfamily/Domain/Motif Alignments Family Relationships & Functional Features Family Classification & Functional Analysis HMM Motif Search Pattern Search SignalP/TMHMM Analytical Pipeline

Integrated Bioinformatics System Global Bioinformatics Analysis of 1000’s of Genes and Proteins Pathway Discovery, Target Identification

Lab Section

Text Search

Text Search Result (I) Pre-computed Extend your search or start over Choose columns to be displayed Expand view Pre-computed BLAST Results Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report

Text Search Result (III) Number of Related Seq. at 3 different E-value cut-offs

Text Search Result (II) Extend your search or start over Choose columns to be displayed Curated domain architecture with links to Pfam database Link to PIRSF report Extent of family curation

Peptide Search

Peptide Search & Results Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Matching peptide highlighted in the sequence Sorting arrows

Choose columns to be displayed Batch Retrieval Results (I) Choose columns to be displayed 3 4 5 2 1 6 Links to iProClass and UniProtKB reports Retrieve more sequences

Curated domain architecture (N- to C- termini) with links Batch Retrieval Results (II) Choose columns to be displayed Retrieve more families 3 4 5 2 1 6 Links PIRSF reports Curated domain architecture (N- to C- termini) with links to Pfam database

Blast Similarity Search

Blast / Related Sequences Results

Blast Result & Pairwise Alignment BLAST Aligment

Pairwise Alignment

Multiple Alignment Interactive Phylogenetic Tree and Alignment

Phylogenetic Tree and Alignment View

Pattern Search (I)

Pattern Search (II) Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Sorting arrows Display the query pattern

PIRSF scan

PIRSF Report

PIRSF Family Hierarchy

Taxonomic Distribution & Phylogenetic Pattern

Rabbit Alpha Crystallin A Chain An iProClass View of the entry Pre-computed BLAST results See protein synonyms See IDs from different databases

alpha-Crystallin and Related Proteins