Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
BLAST Sequence alignment, E-value & Extreme value distribution.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Profiles for Sequences
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Gene Ontology Luis Tari. Gene Ontology (GO) URL: Gene Ontology is A hierarchy of roles of genes.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Modules An Introduction to Bioinformatics.
09 / 23 / Predicting Protein Function Using Machine-Learned Hierarchical Classifiers Roman Eisner Supervisors: Duane Szafron.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Sequence alignment, E-value & Extreme value distribution
Bioinformatics Resources and Tools on the Web: A Primer.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Automatic methods for functional annotation of sequences Petri Törönen.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BLAST benchmarks George Coulouris NCBI/NLM/NIH June 2005.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Protein and RNA Families
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Homology modeling with SWISS-MODEL
Protein Domain Database
Bioinformatics and Computational Biology
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. The sequence.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Protein families, domains and motifs in functional prediction May 31, 2016.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
What is BLAST? Basic BLAST search What is BLAST?
Protein Families, Motifs & Domains.
Basics of BLAST Basic BLAST Search - What is BLAST?
Sequence based searches:
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Genome Annotation Continued
Mangaldai College, Mangaldai
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010

What’s the problem? 1.Huge bottleneck = finding a protein’s function when given a protein sequence 1.Incomplete, inaccurate, or inconsistent annotations are difficult to work with and can propagate 1.No good way to measure the accuracy of an annotation predictor

What is the CAFA Challenge?

What are Gene Ontology (GO) terms? GO = controlled vocabulary of “gene ontologies” Cover three domains: ▫Cellular component ▫Molecular function ▫Biological process Hierarchy: ▫Broad/general (e.g. “catalytic activity”) ▫Specific (e.g. “leukotriene-C4-synthase activity”)

Outline of Our Approach CAFA targets (FASTA sequences) CAFA targets (FASTA sequences) GO ids for each CAFA target SMURF? Betawrap Pro? Other Secondary Structure Predictor? BLAST PFAM

Pfam: Protein Family Database Collection of protein families represented by: ▫Multiple sequence alignments ▫Hidden Markov Models Two sections of Pfam: ▫A: high-quality, manually-curated ▫B: large, automatically- generated Sample Multiple Sequence Alignment Sample Hidden Markov Model

BLAST: Basic Local Align’t Search Tool Goal: find homologous (i.e. derived from a common ancester) sequences from a database Various BLAST programs: ▫blastp = query: protein, database: protein ▫blastn = query: nucleotide, database: nucleotide ▫blastx = query: translated nucleotide, database: protein ▫tblastn = query: protein, database: translated nucleotide ▫tblastx = query: translated nucleotide, database: translated nucleotide

SMURF: Structural Motifs Using Random Fields Determines whether a protein sequence contains one of the following super secondary structures: ▫6-bladed propeller ▫7-bladed propeller ▫8-bladed propeller ▫Double blades (i.e. 6-6, 6-7,6-8…) Developed at Tufts! Some propeller functions: ▫Often WD40 repeat –protein-protein interaction ▫Signaling, transcription, cell cycle Smurf! 7-bladed propeller

Final Database Structure cafa_targets cafa_id uniprot_id gi_access_id blast_results cafa_id pdb_id refseq_id e_value_score pfam_results cafa_id pfam_id smurf_results cafa_id template_id p_value_score pdb_id go_id refseq_id uniprot_id go_id pfam_id go_id template_id go_id go_results cafa_id go_id source confidence INPUT RESULTS MAPPINGOUTPUT

Final Results Statistics PDB BLAST SMURFPfam ,445 1,356 Distribution of sequence hits by method Of 8,904 unknown sequences… 4,265 had at least one hit in PDB BLAST 4,824 had at least one hit in Pfam 104 had at least one hit in SMURF In total, 5,694 unique sequences had at least one hit, a 63.9% success

Example Result T38114 MDLDMNGGNKRVFQRLGGGSNRPTTDSNQKVCFHWRAGRCNRYPCPYLHRELPGPGSGPVAASSNKRVADESGFAGPSHR RGPGFSGTANNWGRFGGNRTVTKTEKLCKFWVDGNCPYGDKCRYLHCWSKGDSFSLLTQLDGHQKVVTGIALPSGSDKLY TASKDETVRIWDCASGQCTGVLNLGGEVGCIISEGPWLLVGMPNLVKAWNIQNNADLSLNGPVGQVYSLVVGTDLLFAGT QDGSILVWRYNSTTSCFDPAASLLGHTLAVVSLYVGANRLYSGAMDNSIKVWSLDNLQCIQTLTEHTSVVMSLICWDQFL LSCSLDNTVKIWAATEGGNLEVTYTHKEEYGVLALCGVHDAEAKPVLLCSCNDNSLHLYDLPSFTERGKILAKQEIRSIQ IGPGGIFFTGDGSGQVKVWKWSTESTPILS BLAST: matches with PDB structures 2OVP, 3MKS, 2CNX, 1P22, 1NEX, 3N0E ▫Transcription, mitosis, methylation, protein binding Pfam: match to family PF00642 ▫Zinc ion binding, nucleic acid binding SMURF: match to 7-bladed β-propeller template ▫WD domain (protein binding)

Possible Future Directions Improving functional annotation for β- propellers identified by SMURF ▫Analyze training set of propeller proteins with known function to build probabilistic model of protein function based on propeller type Addition of other structural prediction tools for motifs with known function ▫G-coupled receptors, membrane bound proteins Expansion of BLAST search to include full nr database

Questions?