Functional Annotation 基因功能预测唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日.

Functional Annotation 基因功能预测唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日

Functional Annotation ?

Name that protein ? C 2 H 2 zinc finger proteins Calmodulin and calmodulin-related calcium sensor proteins Cellulose Synthase Gene Family Cysteine Rich Peptides Cytochrome P450 Early Auxin-responsive Aux/IAA Gene Family F-Box Proteins Glycosyl Hydrolase MADS-box family Serine Proteases WRKY family ……

Erythropoietin ( 促红细胞生成素 )

Myostatin ( 肌肉生长限制因子 )

Outline Basic Searches to Run Advanced Assignments Protein Families Naming Genes

1. Basic Searches to Run

Basic Searches to Run BLAST (nucleotide or protein homology)  Non-redundant protein sequences (nr)  UniRef (UniProt - Swiss-Prot, TrEMBL)  Trusted genomes (TAIR) CDD (NCBI’s Conserved Domain Database) Interpro (protein families, domains and functional sites) HMMER or SAM (searches using statistical descriptions)  Pfam (database of protein families and HMMs)  TIGRFAMS (protein family based HMMs)  SCOP (Structural domains)  TMHMM (Transmembrane domains) SignalP (signal peptide cleavage sites) TargetP (subcellular location) Many others

Web BLAST NCBI Blast http://www.ncbi.nlm.nih.gov/blast/http://www.ncbi.nlm.nih.gov/blast/ WU blast http://genome.wustl.edu/tools/blast/http://genome.wustl.edu/tools/blast/ Uniprot-swissprot blast http://www.uniprot.org/http://www.uniprot.org/ Phytozome http://www.phytozome.net/search.phphttp://www.phytozome.net/search.php The Gene Indices http://compbio.dfci.harvard.edu/tgi/http://compbio.dfci.harvard.edu/tgi/ Sanger projects http://www.sanger.ac.uk/DataSearch/http://www.sanger.ac.uk/DataSearch/ TAIR - http://www.arabidopsis.org/Blast/index.jsphttp://www.arabidopsis.org/Blast/index.jsp

CDD Collection of multiple sequence alignments Contains protein domain models imported from outside sources, such as Pfam, SMART, COGs (Clusters of Orthologous Groups of proteins), PRK (PRotein Klusters), and are curated at NCBI.

InterPro Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

Hidden Markov Model Databases of HMM domains to search: Pfam: http://www.sanger.ac.uk/Software/Pfam/ http://www.sanger.ac.uk/Software/Pfam/ TIGRFAMs: http://www.jcvi.org/cms/research/projects/tigrfams/overview/ http://www.jcvi.org/cms/research/projects/tigrfams/overview/ SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/ http://scop.mrc-lmb.cam.ac.uk/scop/ TMHMM: http://www.cbs.dtu.dk/services/TMHMM/ http://www.cbs.dtu.dk/services/TMHMM/ Tools to use: HMMER, HMMPFAM: http://hmmer.janelia.org/ http://hmmer.janelia.org/

Pfam For each family in Pfam you can: Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures

TMHMM Predicts transmembrane helices in integral membrane proteins using HMM’s

SignalP Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms. Based on a combination of artificial neural networks and HMMs.

TargetP TargetP predicts the subcellular location of eukaryotic proteins. The location assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP) mitochondrial targeting peptide (mTP) secretory pathway signal peptide (SP)

Gene function evidence

2. Advanced Assignments

Advanced Assignments Enzyme Commission (EC) Number http://www.chem.qmul.ac.uk/iubmb/enzyme/ http://www.chem.qmul.ac.uk/iubmb/enzyme/ Gene Ontology (GO) Terms Pathways  KEGG  MetaCyc  Pathway Tools

Assigning EC Number EC classification scheme is a hierarchical numerical classification based on the chemical reactions enzymes catalyze. Every enzyme code consists of four numbers separated by periods. Ex.- EC 1.1.1.1- alcohol dehydrogenase EC numbers may be assigned computationally. There are many available tools and methods for predicting EC numbers and pathways. Common problems:  The computational method may not be specific for assigning EC number to the enzymes. It may be accurate to decide an enzyme family for a gene rather than a specific enzyme. To be precise, the fourth number (Ex. 1.1.1-) is often left blank.

GO Terms Gene Ontology (Gene Ontology Consortium™ ) is a method used to structure biological knowledge using a dynamic controlled vocabulary across organisms.  Molecular function (MF) – What the gene product does – Think ‘activity’ – Ion channel activity  Biological process (BP) – A biological objective – Ion transport, transmembrane transport  Cellular component (CC) – Location in the cell (or smaller unit) – Or part of a complex – Membrane, plasma membrane You can obtain GO for any sequence using tools like:  BLAST2GO  INTERPRO2GO

View Pathways Graphical interface for users to visualize the substrates, final products and steps in a completed pathway catalyzed by an enzyme (gene).  KEGG: http://www.genome.jp/kegg/tool/search_pathway.html http://www.genome.jp/kegg/tool/search_pathway.html  MetaCyc: http://metacyc.org http://metacyc.org  Pathway Tools: http://bioinformatics.ai.sri.com/ptools http://bioinformatics.ai.sri.com/ptools

Pathway Tools

3. Protein Families

Why Compute Protein Families? To group proteins by probable function To identify possible gene structure problems To identify evolutionary relationships between protein families. Gene naming and Transposable Element assignment

Domain Based Protein Families (Paralogous families) Identify Pfam and all vs all blastP based domains protein sequences Families grouped based on type and number of domains

Domain Based Protein Families (Paralogous families) Identify Pfam and all vs all blastP based domains protein sequences 9 family members contain: PF00027 - Cyclic nucleotide-binding domain PF00520 - Ion transport protein para_246

OrthoMCL/TribeMCL Protein Clustering Markov clustering method for grouping proteins into families http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/TribeMCL.html Nucleic Acids Res. 2002 April 1; 30(7): 1575–1584.

4. Naming Genes

Functional Assignments Name Descriptive common name for the protein, with as much specificity as the evidence supports; gene symbol. Role Describe what the protein is doing in the cell and why. Associated information: Supporting evidence: Domain and motifs EC number if protein is an enzyme Paralogous family membership

Naming convention

Methods to name gene products 1.Top BLAST hit to database of choice 2.Manually aggregate evidence from multiple sources 3.Automated Assignment of Human Readable Descriptions (AHRD) https://github.com/groupschoof/AHRD https://github.com/groupschoof/AHRD

Automated Human Readable Description (AHRD)

https://github.com/groupschoof/AHRD

练习已知蛋白序列，命名使用在线工具查找结构域和功能域

Functional Annotation 基因功能预测唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日.

Similar presentations

Presentation on theme: "Functional Annotation 基因功能预测唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.

Similar presentations

Presentation on theme: "Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日."— Presentation transcript:

Similar presentations

About project

Feedback

Functional Annotation 基因功能预测唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日.

Presentation on theme: "Functional Annotation 基因功能预测唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日."— Presentation transcript: