Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi Panther Protein Informatics group Celera.

Similar presentations


Presentation on theme: "Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi Panther Protein Informatics group Celera."— Presentation transcript:

1 Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi mihn@fc.celera.com Panther Protein Informatics group Celera Genomics

2 How to classify proteins in a robust and accurate way?

3 Outline 1.Introduction to PANTHER 2.Comparison of functional classification of Drosophila proteins by FlyBase and PANTHER

4 What is PANTHER? PANTHER library (PANTHER/LIB) =a family tree =a multisequence alignment =an HMM PANTHER index (PANTHER/X) =Molecular function =Biological process

5 Building the library 500,000 protein sequences (filtered GenBank NR) 2200+ protein family clusters Biologist curation & & 40,000 subfamilies Family and subfamily was labeled with a name and classified by PANTHER/X categories MSAHMMtree

6 PANTHER library (PANTHER/LIB)

7 PANTHER index (PANTHER/X) PANTHER/X GO RECEPTOR => G-protein coupled receptor => protein kinase receptor => => serine/threonine protein kinase receptor => => tyrosine protein kinase receptor transmembrane receptor protein kinase GO:0019199 => transmembrane receptor protein serine/threonine kinase GO:0004675 => => transforming growth factor alpha receptor GO:0005023 => => transforming growth factor beta receptor GO:0005024 => => => activin receptor GO:0017002 => => => => type I activin receptor GO:0016361 => => => => type II activin receptor GO:0016362 => => => type I transforming growth factor beta receptor GO:0005025 => => => => type I activin receptor GO:0016361 => => => type II transforming growth factor beta receptor GO:0005026 => => => => type II activin receptor GO:0016362 => transmembrane receptor protein tyrosine kinase GO:0004714 => => boss receptor GO:0008288 => => ephrin receptor GO:0005003 => => => GPI-linked ephrin receptor GO:0005004 => => => transmembrane-ephrin receptor GO:0005005 => => epidermal growth factor receptor GO:0005006 => => => gurken receptor GO:0008313 => => fibroblast growth factor receptor GO:0005007 => => hepatocyte growth factor receptor GO:0005008 => => insulin receptor GO:0005009 => => insulin-like growth factor receptor GO:0005010 => => macrophage colony stimulating factor receptor GO:0005011 => => macrophage receptor GO:0008019 => => Neu/ErbB-2 receptor GO:0005012 => => neurotrophin TRK receptor GO:0005013 => => => neurotrophin TRKA receptor GO:0005014 => => => neurotrophin TRKB receptor GO:0005015 => => => neurotrophin TRKC receptor GO:0005016 => => platelet-derived growth factor receptor GO:0005017 => => => platelet-derived growth factor\, alpha-receptor GO:0005018 => => => platelet-derived growth factor\, beta-receptor GO:0005019 => => stem cell factor receptor GO:0005020 => => vascular endothelial growth factor receptor GO:0005021 => vascular endothelial growth factor receptor GO:0005021 signal transducer GO:0004871 => receptor GO:0004872 => => transmembrane receptor GO:0004888

8 PANTHER Scoring A fasta file Family and subfamily HMMs Score above threshold? Classified (Name Molecular function Biological process) yes

9 How accurate is PANTHER? FlyBase A manually curated database for Drosophila genes PANTHER An automated annotation process Assess the associations

10 Process for comparison Fly protein sequences FlyBase annotation With GO terms PANTHER annotation by Scoring against PANTHER Automated Comparison of FlyBase and Panther assignments Match Not Match Manual review Correct Incorrect Inconclusive

11 Coverage of Drosophila proteins classified by FlyBase and PANTHER. C FlyBase classified to GO FlyBase not classified to GO 11538 2794 D PANTHER HMM hits classified to GO PANTHER HMM hits not classified to GO Not hit 6205 4469 3658 E B PANTHER HMM hits classified to GO PANTHER HMM hits not classified to GO Not hit 4862 3265 6205 FlyBase classified to GO FlyBase not classified to GO 6301 8031 A PANTHER not classified to GO FlyBase PANTHER Classified overlap 3283 PANTHER FlyBase PANTHER not classified to GO F Classified overlap 1159 Molecular function Biological process FlyBase PANTHERBoth

12 Assessment of molecular function associations FlyBase 2747 663 195 58 37 2737 700 345 50 35 Auto match Manual match CorrectIncorrect Inconclusive PANTHER

13 Types of errors Homology error – an error cause by incorrect functional prediction based on sequence homology. Human error – an error on part of the human curator. Evidence error – an error by using an evidence that is incorrect.

14 Analysis of errors PANTHERFlyBase Number of homology errors 835 Number of human errors 4023 Number of evidence errors 20 Total number of incorrect associations 5058 Association error rate (%) 1.3%1.6%

15 PANTHER function inference in the context of a protein sequence tree FBgn0032382 (CG14934) FlyBase: alpha glucosidase neutral amino acid transporter PANTHER:alpha glucosidase CG14934 Alpha glucosidase Neutral a.a. transporter Example of homology error Alpha amylase

16 Summary PANTHER is an automated method to classify proteins in a robust way. The accuracy of PANTHER was assessed by comparing its classification of Drosophila proteins with FlyBase’s. A total of 3283 Drosophila proteins were associated to at least one molecular function category by both FlyBase and PANTHER (3867 molecular function associations by PANTHER, and 3700 by FlyBase). About 90% of these associations by FlyBase and PANTHER match with each other. Total error rate is < 2% for both methods.

17 Acknowledgements Celera Genomics Paul Thomas Jody Vandergriff Michael Campbell Apurva Narechania William Majoros Karen Diemer Olivier Doremieux Nan Guo Anish Kejariwal Steven Ladunga Betty Lazareva Anushya Muruganujan Steve Rabkin FlyBase Michael Ashburner Susanna Lewis


Download ppt "Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi Panther Protein Informatics group Celera."

Similar presentations


Ads by Google