Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.

Slides:



Advertisements
Similar presentations
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Advertisements

Pfam(Protein families )
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Introduction of bioinformatics and Biological Database 高雄醫學大學 生物醫學暨環境生物學系 助理教授 張學偉 2006/08/08.
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
The Protein Data Bank (PDB)
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein and Function Databases
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Automatic methods for functional annotation of sequences Petri Törönen.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Motif discovery and Protein Databases Tutorial 5.
Copyright OpenHelix. No use or reproduction without express written consent1.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Group discussion Name this protein. Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Protein families, domains and motifs in functional prediction May 31, 2016.
Protein families, domains and motifs in functional prediction
Protein Families, Motifs & Domains.
Functional manual annotation including GO
Demo: Protein Information Resource
Sequence based searches:
Functional Annotation of Transcripts
Functional Annotation Final Results
Genome Annotation Continued
Ensembl Genome Repository.
Prediction of protein function from sequence analysis
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Annotation Presentation
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日

Functional Annotation ?

Name that protein ? C 2 H 2 zinc finger proteins Calmodulin and calmodulin-related calcium sensor proteins Cellulose Synthase Gene Family Cysteine Rich Peptides Cytochrome P450 Early Auxin-responsive Aux/IAA Gene Family F-Box Proteins Glycosyl Hydrolase MADS-box family Serine Proteases WRKY family ……

Erythropoietin ( 促红细胞生成素 )

Myostatin ( 肌肉生长限制因子 )

Outline Basic Searches to Run Advanced Assignments Protein Families Naming Genes

1. Basic Searches to Run

Basic Searches to Run BLAST (nucleotide or protein homology)  Non-redundant protein sequences (nr)  UniRef (UniProt - Swiss-Prot, TrEMBL)  Trusted genomes (TAIR) CDD (NCBI’s Conserved Domain Database) Interpro (protein families, domains and functional sites) HMMER or SAM (searches using statistical descriptions)  Pfam (database of protein families and HMMs)  TIGRFAMS (protein family based HMMs)  SCOP (Structural domains)  TMHMM (Transmembrane domains) SignalP (signal peptide cleavage sites) TargetP (subcellular location) Many others

Web BLAST NCBI Blast WU blast Uniprot-swissprot blast Phytozome The Gene Indices Sanger projects TAIR -

CDD Collection of multiple sequence alignments Contains protein domain models imported from outside sources, such as Pfam, SMART, COGs (Clusters of Orthologous Groups of proteins), PRK (PRotein Klusters), and are curated at NCBI.

InterPro Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

Hidden Markov Model Databases of HMM domains to search: Pfam: TIGRFAMs: SCOP: TMHMM: Tools to use: HMMER, HMMPFAM:

Pfam For each family in Pfam you can: Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures

TMHMM Predicts transmembrane helices in integral membrane proteins using HMM’s

SignalP Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms. Based on a combination of artificial neural networks and HMMs.

TargetP TargetP predicts the subcellular location of eukaryotic proteins. The location assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP) mitochondrial targeting peptide (mTP) secretory pathway signal peptide (SP)

Gene function evidence

2. Advanced Assignments

Advanced Assignments Enzyme Commission (EC) Number Gene Ontology (GO) Terms Pathways  KEGG  MetaCyc  Pathway Tools

Assigning EC Number EC classification scheme is a hierarchical numerical classification based on the chemical reactions enzymes catalyze. Every enzyme code consists of four numbers separated by periods. Ex.- EC alcohol dehydrogenase EC numbers may be assigned computationally. There are many available tools and methods for predicting EC numbers and pathways. Common problems:  The computational method may not be specific for assigning EC number to the enzymes. It may be accurate to decide an enzyme family for a gene rather than a specific enzyme. To be precise, the fourth number (Ex ) is often left blank.

GO Terms Gene Ontology (Gene Ontology Consortium™ ) is a method used to structure biological knowledge using a dynamic controlled vocabulary across organisms.  Molecular function (MF) – What the gene product does – Think ‘activity’ – Ion channel activity  Biological process (BP) – A biological objective – Ion transport, transmembrane transport  Cellular component (CC) – Location in the cell (or smaller unit) – Or part of a complex – Membrane, plasma membrane You can obtain GO for any sequence using tools like:  BLAST2GO  INTERPRO2GO

View Pathways Graphical interface for users to visualize the substrates, final products and steps in a completed pathway catalyzed by an enzyme (gene).  KEGG:  MetaCyc:  Pathway Tools:

Pathway Tools

3. Protein Families

Why Compute Protein Families? To group proteins by probable function To identify possible gene structure problems To identify evolutionary relationships between protein families. Gene naming and Transposable Element assignment

Domain Based Protein Families (Paralogous families) Identify Pfam and all vs all blastP based domains protein sequences Families grouped based on type and number of domains

Domain Based Protein Families (Paralogous families) Identify Pfam and all vs all blastP based domains protein sequences 9 family members contain: PF Cyclic nucleotide-binding domain PF Ion transport protein para_246

OrthoMCL/TribeMCL Protein Clustering Markov clustering method for grouping proteins into families Nucleic Acids Res April 1; 30(7): 1575–1584.

4. Naming Genes

Functional Assignments Name Descriptive common name for the protein, with as much specificity as the evidence supports; gene symbol. Role Describe what the protein is doing in the cell and why. Associated information: Supporting evidence: Domain and motifs EC number if protein is an enzyme Paralogous family membership

Naming convention

Methods to name gene products 1.Top BLAST hit to database of choice 2.Manually aggregate evidence from multiple sources 3.Automated Assignment of Human Readable Descriptions (AHRD)

Automated Human Readable Description (AHRD)

练习 已知蛋白序列,命名 使用在线工具查找结构域和功能域