Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

Similar presentations


Presentation on theme: "1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics."— Presentation transcript:

1 1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

2 2 DNARNAphenotypeprotein

3 3 Genome Transcriptome Proteome Secretome mRNA (protein-coding DNA sequences) Protein sequences Proteins with secretory signal peptide Transcription Translation Secretion

4 4 Günter Blobel

5 5

6 6

7 7

8 8 Biomaterials Small molecules Fungi secreted enzymes Yeasts Moulds Mushrooms Biomaterials Bio-fuels Enzymes

9 9 How to identify secreted proteins? Genome Transcriptome Proteome Secretome Transcription Translation Secretion (1) Direct identification using proteomics methods (Tsang et al. 2009) (2) Computational prediction from predicted proteome (3) EST data mining

10 10 Secreted Proteins Classical secreted proteins have a signal peptide at N-terminus; Not all proteins have a signal peptide are secreted: Signal peptide = secreted protein

11 11 SignalP: a program to predict if a protein contains a signal peptide. Phobius: signal peptide and transmembrane domain predicton. WolfPsort: a multiple subcellular location predictor TargetP: detect proteins targeted to mitochondria. TMHMM: transmembrane domain prediction. PS-Scan: detection ER- retention signals

12 12

13 13

14 14 Human cytochrome C oxidase subunit 1 (COX1)

15 15

16 16 Data SecretedNon-secreted Fungi 241 5,992 Animals5,56819,048 Plants 216 7,528 Protists 32 1,979

17 17 Method Sensitivity (%) = TP/(TP + FN) x 100 Specificity (%) = TN/(TN + FP) x 100 Mathews’ Correlation Coefficient (MCC) MCC (%) = (TP x TN – FP x FN) x 100 /((TP + FP) (TP + FN) (TN + FP) (TN + FN)) 1/2

18 18 TPFPTNFN Sn (%)Sp (%) MCC (%) SignalP2323295663996.394.561.2 Phobius22620357891593.896.668.8 TargetP22858354091394.690.348.6 WolfPsort23016758251195.497.273.1 SignalP/TMHMM22816858241394.697.272.6 Phobius/TMHMM22420057921792.996.768.6 TargetP/TMHMM22426557271792.995.663.5 WolfPsort/TMHMM22713558571494.297.775.8 SignalP/TMHMM/WolfPsort2268659061593.898.681.6 SignalP/TMHMM//WolfPsort/Phobius2226959231992.198.883.1 SignalP/TMHMM/WolfPsort/Phobius/PS-Scan2226759251992.198.983.4 SignalP/TMHMM/WolfPsort/Phobius/TargetP/PS- Scan2186659262390.598.982.6 TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity;MCC: Mathews' correlation coefficient. Table 1. Prediction accuracies of secreted proteins in fungi Min XJ (2010) JPB 3:143-147.

19 19 Table 2. Prediction accuracies of secreted proteins in animals TPFPTNFN Sn (%)Sp (%) MCC (%) SignalP530741081494026195.378.463.5 Phobius515711671788141192.693.982.8 TargetP531354121363625595.471.656.5 WolfPsort513517621728643392.290.777.3 SignalP/TMHMM521713831766535193.792.781.6 Phobius/TMHMM514811421790642092.594.082.9 TargetP/TMHMM522213691767934693.892.881.8 WolfPsort/TMHMM509310841796447591.594.382.8 Phobius/WolfPsort49595551849360989.197.186.4 Phobius/WolfPsort/TMHMM49525441850461688.997.186.5 Phobius/WolfPsort/TMHMM/SignalP49525441850461688.997.186.5 Phobius/WolfPsort/TMHMM/TargetP49345051854363488.697.386.7 Phobius/WolfPsort/TMHMM/TargetP/PS-Scan49314821856663788.697.586.9 Phobius/WolfPsort/TMHMM/TargetP/PS- Scan/SignalP49314821856663788.697.586.9 TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient. Min XJ (2010) JPB 3:143-147.

20 20 Table 3. Prediction accuracies of secreted proteins in plants TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient. TPFPTNFNSn (%)Sp (%) MCC (%) SignalP19936471641792.195.255.4 Phobius18863868902887.091.541.9 TargetP19844270861891.794.151.3 WolfPsort10870745810850.099.153.9 SignalP/TMHMM19723772911991.296.963.0 Phobius/TMHMM18863668922887.091.642.0 TargetP/TMHMM19525672722190.396.661.1 WolfPsort/TMHMM10645748311049.199.457.7 SignalP/HMM/TargetP19514973792190.398.070.6 Phobius/TargetP/TMHMM18312274063384.798.470.4 SignalP/TMHMM/WolfPsort10635749311049.199.559.9 SignalP/TMHMM/Phobius18818373452887.097.665.2 SignalP/HMM/Phobius/TargetP18311374153384.798.571.5 SignalP/HMM/Phobius/TargetP/PS-Scan18310074283384.798.773.2 SignalP/HMM/Phobius/TargetP/WolfPsort/PS-Scan10229749911447.299.659.8 Min XJ (2010) JPB 3:143-147.

21 21 Summary Different prediction tools have different accuracies for prediction of secretomes in different kingdoms of species; Combining these tools often increases the prediction accuracy. However, differential combination are needed for species in different kingdoms. Optimal methods are proposed.

22 22

23 23

24 24

25 25 Views gi accession UniProt ID Keywords Species User Inputs Manual Curation Subcellular Location FunSecKB fragAnchor PS-SCAN TMHMM TargetP WolfPsort Phobius SignalP Database RefSeq UniProt Prediction Tools External Links Lum G & Min XJ (2011) Database.

26 26 Summary of FunSecKB Currently the database contains a total of 478,073 fungal protein sequences 23,878 predicted and / or curated secreted proteins A total of 118 fungal species including 52 fungal species having a complete proteome

27 27 Lum G & Min XJ (2011) Database.

28 28 Lum G & Min XJ (2011) Database.

29 29 Lum G & Min XJ (2011) Database.

30 30

31 31

32 32

33 33

34 34

35 35

36 36 Plant secretomes and other subcellular proteins Vitis vinifera (%) Populus trichocarpa (%) Arabidopsis thaliana (%) Oryza sativa (%) Sorghum Bicolor (%) Total proteins 2983641794322143999732796 Secreted proteins 1892 (6.3)2487 (6.0)2835 (8.8)3085 (7.7)2394 (7.3) Mitochondria Membrane 490 (1.6)566 (1.4)415 (1.3)832 (2.1)666 (2.0) Non- membrane 3877 (13.0)5238 (12.5)3729 (11.6)7187 (18.0)5768 (17.6) Chloroplast Membrane 565 (1.9)601 (1.4)671 (2.1)720 (1.8)610 (1.9) Non- membrane 3675 (12.3)4850 (11.6)4865 (15.1)6318 (15.8)5385 (16.4) ER proteins 29 (0.1)37 (0.1)60 (0.2)32 (0.1)25 (0.1) Other membrane proteins 3251 (10.9)4532 (10.8)3649 (11.3)3672 (9.2)2900 (8.8) Others (unknown) 16057 (53.8)23483 (56.2)15990 (49.64)18151 (45.4)15048 (45.9)

37 37

38 38

39 39 Acknowledgements Gengkon Lum(M. S. Graduate) Jessica Orr (Undergraduate) Docylyne Shelton (Undergraduate) Braden Walters (Undergraduate )


Download ppt "1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics."

Similar presentations


Ads by Google