Presentation is loading. Please wait.

Presentation is loading. Please wait.

Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.

Similar presentations


Presentation on theme: "Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute."— Presentation transcript:

1 Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute

2 Story which we experienced at my lab 1.Sequest search for the biomarker discovery after the 1 st experiment of a sample 2.Decoy approach within 5% error rate 3.Preliminary biological interpretation for the 1 st dataset 4.We installed high-performance Mascot server. 5.Mascot search for the 1 st data 6.Very different protein list from the previous Sequest result  presumed markers disppeared.  7.Need to change biological interpretation for Mascot result 8.Two more experiments for the confirmation 9. We had to select which search engine will be used for the 2 nd and 3 rd dataset analysis. Which one is the correct interpretation?

3 Factor which affects on the Proteome Analysis Experimental dependence ▫Instrumentation : ESI/MS, MALDI/MS, … ▫Reagent : enzyme for proteolysis, isotope tag for quantitation, affinity tag for enrichment, …. ▫Protocol : MudPIT, IPAS,… Informatics dependence ▫Software : Mascot, Sequest, PEAKS, X!Tandem, OMSSA, Lutefisk… ▫Data analysis protocol : decoy, peptideProphet, … ▫Sequence database : SwissProt, IPI, NCBI nr, OWL, … Different result by different method

4 Different experiment P.A. Kirkland et al., J. Proteome Res. 2008, 7(11), 5033-5039.

5 9% 19%7% 34% 5% 4%22% SEQUEST X!Tandem Mascot Each search engine identifies about the same number of spectra, But the overlap is surprisingly small. Different search engines match different spectra. But the overlap is surprisingly small. Different search engines match different spectra. B.C. Searle, Improving Sensitivity by Combining Results from Multiple Search Methodologies, Proteome Software Inc. Search engines

6 For Each Spectrum Get Mascot IDs Get SEQUEST IDs Get X!Tandem IDs Calculate SEQUEST Probability Calculate Mascot Probability Calculate X!Tandem Probability Calculate Combined Peptide Probability Peptide Prophet* Scaffold Merger Calculate Protein Probabilities Protein Prophet* … *Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75, 4646-4658 Scaffold uses Nesvizhskii’s algorithm to convert SEQUEST and Mascot scores to peptide probabilities Scaffold uses another algorithm by Nesvizskii to combine peptide probabilities. B.C. Searle, Improving Sensitivity by Combining Results from Multiple Search Methodologies, Proteome Software Inc.

7 Factors which effect on the Proteome Analysis Experimental dependence ▫Instrumentation : ESI/MS, MALDI/MS, … ▫Reagent : enzyme for proteolysis, isotope tag for quantitation, affinity tag for filtering, …. ▫Protocol : MudPIT, IPAS,… Data analysis method dependence ▫Software : Mascot, Sequest, PEAKS, X!Tandem, OMSSA, Lutefisk… ▫Data analysis protocol : decoy, peptideProphet, … ▫Sequence database : SwissProt, IPI, NCBI nr, OWL, … Solution : integration Expensive way to many small laboratories

8 Suggest to group proteins

9 As usual : Grouping the identified proteins X. Li, et al., ‘Comparison of alternative analytical techniues for the characterisation of the human serum proteome in HUPO Plasma Proteome Project ‘, Proteomics 2005, 5, 3423-3441. Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 6 Peptide 5 Peptide 7 Protein D Protein E Protein F Protein A Protein B Protein C

10 Smaller database is fast but may miss many true sequences.  less true positives Idea : Optimize the database size ( Rope walking ) Larger database will include more true sequences.  It is not fast and it may also include many false positives

11 Sequence Database IPI human, IPI mouse (EBI) : HUPO recommendation SwissProt (EBI) nr (NCBI) EST database

12 Similar sequences Grouping proteins of sequence database before database search 1 st search 2 nd search 3 rd search

13 1 st search with IPI human representative database IPI00472787.1 19kDa protein IPI00657911.1 Gamma-globin IPI00465184.3 Guanine deaminase IPI00644409.1 Guanine deaminase IPI00334432.3 16kDa protein IPI00291006.1 Malate dehydrogenase

14 IPI00719462.1 Beta-globin IPI00 654755.1 Hemoglobin  IPI00 657911.1 Gamma-globin IPI00 410714.4 Hemoglobin  IPI00 334432.3 16kDa protein IPI00 719716.1 Hemoglobin  IPI00 220706.1 Hemoglobin  -1 IPI00 554676.1 Hemoglobin  -2 IPI00 291006.1 Malate dehydrogenase IPI00 643484.1 Guanine deaminase IPI00 644932.1 Guanine deaminase IPI00 465184.3 Guanine deaminase IPI00 644409.1 Guanine deaminase 2 nd search with groups of IPI human database selected from the IPI representative DB search IPI00 217471.2 Hemoglobin  IPI00472787.1 19kDa protein IPI00 382950.1 Beta-globin gene IPI00 473011.2 Hemoglobin  IPI00 719281.1 Hemoglobin Lepore-Baltimore IPI00 657660.1 15kDa protein IPI00 470375.5 Delta-hemoglobin IPI000 30809.1 Gamma-G globin

15 3 rd search with groups of NCBI nr human database selected from the IPI representative DB search gi|21050583 gi|999562 gi|21050583 gi|71727239 gi|71727241 gi|71727243 gi|71727245 gi|71727247 gi|71727249 gi|71727251 gi|71727253 gi|71727255 gi|71727257 gi|71727259 gi|71727261 gi|71727263 gi|71727265 gi|71727267 gi|71727269 gi|71727271 gi|24159096 gi|24159098 gi|4504349 gi|56749856 gi|61679772 gi|61679774 gi|62738860 gi|24150593 gi|24150591 gi|24150589 gi|24150587 gi|24150585 IPI00472787.1 19kDa protein gi|71370292 IPI00 657911.1 Gamma-globin gi|55957352 gi|55960274 gi|55960273 gi|55957353 IPI00 644409.1 Guanine deaminase gi|11837778 IPI00 465184.3 Guanine deaminase IPI00 334432.3 16kDa protein IPI00 291006.1 Malate dehydrogenase gi|6648067 gi|12804929 gi|49168580 gi|41472053 gi|2906146 gi|21735621 gi|61679684 gi|80747857 gi|61679686

16

17 Keratin Keratin: type I cytoskeletal,epide rmal type I, type I cuticular Cell division protein kinase, tyrosin- protein kinase, Serine/threoni ne-protein kinase, Fibroblast growth factor receptor Guanine nucleotide binding protein Septin Ras-related proteins 1 st search with representative DB 2 nd search with group DB of identified representative proteins

18 Result of the iterative MS/MS ion search databaseIPI human database IPI human representative IPI human selected groups NCBI nr human selected groups Number of proteins48,19324,1206,86032,916 Redundant proteins identified 5,5842,3365,28822,895 Non-redundant proteins identified 2,9442,1362,9344,090 Redundant peptides identified 10,4865,58511,06617,500 Non-redundant peptides identified 6,1245,1776,5696,580 Material : membranous fraction of human brain temporal lobe tissue Experimental Methods : Multidimensional separation / LTQ-MS/MS (ThermoFinnigan) Database Analysis : TurboSEQUEST(ThermoFinnigan), DTASelect (Scripps Institute) Database : IPI.HUMAN.v.3.15.1, NCBI nr human (283, 548 proteins)

19 Mascot vs. Sequest IPI, Sprot, nr, IPI-representative, Sprot- representptive, IPI-IDedGroup, Sprot-IDedGroup

20 Mascot = Sequest Mascot only

21 representative DB approach works differently. Sequest only

22 Compare Mascot, Sequest with IPI, Sprot, nr, representative DB (result from MudPIT analysis of one 1D-gel band of human cell line) Lower XcorrLower Xcorr

23 Lower XcorrLower Xcorr

24 Advantage of representative DB approach It mines more peptide sequences without consuming more time and more search engines. This method can connect different databases. Additionally, we expect that it can give more reliable information for PTM by selecting more probable proteins before PTM search.

25 Thank you. Why I have given this presentation at HUPO PSI?


Download ppt "Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute."

Similar presentations


Ads by Google