Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.

Slides:



Advertisements
Similar presentations
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Introduction to Bioinformatics
Heuristic alignment algorithms and cost matrices
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Introduction to bioinformatics
Sequence similarity.
Multiple sequence alignments and motif discovery Tutorial 5.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple sequence alignment
An Introduction to Bioinformatics
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Sequence Alignment and Database Searching.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Sequence Alignment. How to score a MSA? Very commonly: Sum of Pairs = SP Compute the pairwise score of all pairs of sequences and sum them. Gap.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Domain Database
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Demo: Protein Information Resource
Dot Plots, Path Matrices, Score Matrices
PIR: Protein Information Resource
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center

2 Retrieval, Sequence Search & Classification Methods Retrieve protein info by text / UID Sequence Similarity Search BLAST, FASTA, Dynamic Programming Family Classification Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Integrated Search and Classification System

3 Sequence Similarity Search (I) Based on Pair-Wise Comparisons Dynamic Programming Algorithms Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search

4 Sequence Similarity Search (II) Similarity Search Parameters Scoring Matrices – Based on Conserved Amino Acid Substitution Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 Gap Penalty Search Time Comparisons Smith-Waterman: 10 Min FASTA: 2 Min BLAST: 20 Sec

5 Feature Representation Features of Amino Acids: Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features Alternative Amino Acids: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

6 Substitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)

7 Secondary Structure Features  Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an  Helix (Amphipathic)  Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

8 BLAST BLAST (Basic Local Alignment Search Tool) Extremely fast Robust Most frequently used It finds very short segment pairs (“seeds”) between the query and the database sequence These seeds are then extended in both directions until the maximum possible score for extensions of this particular seed is reached

9 BLAST Search From BLAST Search Interface Table-Format Result with BLAST Output and SSEARCH (Smith- Waterman) Pair-Wise Alignment Link to NCBI taxonomy Click to see alignment Links to iProClass and UniProtKB reports Link to PIRSF report Click to see SSearch alignment

10 Blast Result & Pairwise Alignment BLAST Aligment

11 How do you build a tree? Pick sequences to align Align them Verify the alignment Keep the parts that are aligned correctly Build and evaluate a phylogenetic tree Integrated Analysis

12 Pairwise alignment: Calculate distance matrix Mean number of differences per residue Unrooted Neighbor-Joining Tree Branch length drawn to scale Rooted NJ Tree (guide tree) Root place at a position where the means of the branch lengths on either side of the root are equal Progressive Alignment guided by the tree Alignment starts from the tips of the tree towards the root Thompson et al., NAR 22, 4675 (1994). Multiple Sequence Alignment: CLUSTALW

13 PIR Multiple Alignment and Tree From Text/Sequence Search Result or CLUSTAL W Alignment Interface

14

15 PIR Pattern Search From Text/Sequence Search Result or Pattern Search Interface P-[IV]-[WY]-x(3)-H-[MR]-V-x(3,4)-Q-x(1,2)-D-x(4,5)-G-A-N Alignment of a region involved in catalytic activity Create Pattern and search in database: A B O05689 Test sequence against PROSITE database Signature Patterns for Functional Motifs

16 Pattern Search Result (I) A.One Query Pattern Against UniProtKB or UniRef100 DBs Display the query pattern Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Indicate pattern sequence region(s)

17 Pattern Search Result (II) B.One Query Sequence Against PROSITE Pattern Database

18 Profile Method Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue. Profile Searching Summation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved Positions

19 Prosite PS50157 profile for Zinc finger C2H2

20 Search One Query Protein Against all the Full-length and Domain HMM models for the fully curated PIRSFs by HMMER The matched regions and statistics will be displayed. Shows PIRSF that the query belongs to Statistical data for all domains Statistical data per domain Alignment with consensus sequence 1 PIRSF scan

21 Lab Section

22 Rat eye lens phosphoproteomics in normal and cataract Kamei et al., Biol. Pharm. Bull., Normal Cataract (-)pI(+) Mw More phosphorylated spots in cataract sample. Digestion and MS from Spot 16 gave these peptides: MDVTIQHPWFKR ALGPFYPSR CSLSADGMLTFSG YRLPSNVDQSALS We want to identify the protein(s) that contain these peptides Use Peptide Search MDVTIQHPWFKR

23 Peptide Search

24 Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Matching peptide highlighted in the sequence Sorting arrows Peptide Search & Results Species restricted search Search in UniProtKB, 23 proteins

25 Batch Retrieval Results (I) Retrieve more sequences Retrieve multiple proteins in from iProClass using a specific identifier or a combination of them Provides a means to easily retrieve and analyze proteins when the identifiers come from different databases

26 ID Mapping

27 Blast Similarity Search >P24623 Perform sequence similarity search What proteins are related to rat CRYAA?

28 Blast Search Results BLAST (partial) result for CRYAA_RAT in UniProtKB database BLAST alignment with the human protein

29 Pairwise Alignment

30 UniProtKBDatabase and unique UniParc sequences PIR protein family classification database PIR Text Search ( ( Let’s search for human crystallins

31 Refine your search or start over Display PDB ID Let’s look for crystallins which have 3D structure

32 Domain Display allows to compare simultaneously Pfam domains present in multiple proteins Let’s perform a multiple alignment on the sequences containing PF00030 Share same domain architecture

33 Multiple Alignment

34 Interactive Phylogenetic Tree and Alignment Beta B1 and gamma crystallins share the same domains, SCOP fold and share significant sequence similarity suggesting that they are related

35 Pattern Search (I) Search for proteins containing this pattern (PS00225) in rat Select P07320 and perform a pattern search

36 Pattern Search Result Beta and gamma Crystallins have multiple copies of this pattern