Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Identification by Database Searching John Cottrell Matrix Science.

Similar presentations


Presentation on theme: "Protein Identification by Database Searching John Cottrell Matrix Science."— Presentation transcript:

1 Protein Identification by Database Searching John Cottrell Matrix Science

2 Protein Identification by Database Searching Three ways to use mass spectrometry data for protein identification 1.Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein

3 Protein Identification by Database Searching

4

5 PMF Servers on the Web ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/ Bupid: http://zlab.bu.edu/Amemee/ Mascot: http://www.matrixscience.com/search_form_select.html MassSearch: http://www.cbrg.ethz.ch/services/MassSearch_new MS-Fit (Protein Prospector): http://prospector.ucsf.edu/prospector/mshome.htm PepMAPPER: http://www.nwsr.manchester.ac.uk/mapper/ Profound (Prowl): http://prowl.rockefeller.edu/prowl- cgi/profound.exe Mowse, PeptideSearch, Protocall, Aldente, XProteo

6 Protein Identification by Database Searching Search Parameters database taxonomy enzyme missed cleavages fixed modifications variable modifications protein MW estimated mass measurement error

7 Protein Identification by Database Searching

8  Henzel, W. J., Watanabe, C., Stults, J. T., JASMS 2003, 14, 931-942.

9 Protein Identification by Database Searching Peptide Mass Fingerprint Fast, simple analysis High sensitivity Need database of protein sequences not ESTs or genomic DNA Sequence must be present in database or close homolog Not good for mixtures especially a minor component.

10

11 Protein Identification by Database Searching H – N – C – C – N – C – C – N – C – C – N – C – C – OH R1R1 R2R2 R3R3 R4R4 OOO HHHHHHHH O a 1 b 1 c 1 a 2 b 2 c 2 a 3 b 3 c 3 x 3 y 3 z 3 x 2 y 2 z 2 x 1 y 1 z 1 H+H+  Roepstorff, P. and Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601.

12 Protein Identification by Database Searching Three ways to use mass spectrometry data for protein identification 1.Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein 2.Sequence Query Mass values combined with amino acid sequence or composition data

13 Protein Identification by Database Searching  Mann, M. and Wilm, M., Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66 4390-9 (1994).

14 Protein Identification by Database Searching 1489.430 tag(650.213,GWSV,1079.335)

15 Protein Identification by Database Searching Mascot http://www.matrixscience.com/search_form_select.html MS-Seq (Protein Prospector) http://prospector.ucsf.edu/prospector/mshome.htm MultiIdent (TagIdent, etc.) http://www.expasy.org/tools/multiident/ PeptideSearch, Spider Sequence Tag Servers on the Web

16 Protein Identification by Database Searching

17

18 Sequence Tag Rapid search times Essentially a filter Error tolerant Match peptide with unknown modification or SNP Requires interpretation of spectrum Usually manual, hence not high throughput Tag has to be called correctly Although ambiguity is OK 2060.78 tag(977.4,[Q|K][Q|K][Q|K]EE,1619.7).

19 Protein Identification by Database Searching Three ways to use mass spectrometry data for protein identification 1.Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein 2.Sequence Query Mass values combined with amino acid sequence or composition data 3.MS/MS Ions Search Uninterpreted MS/MS data from a single peptide or from a complete LC-MS/MS run

20 Protein Identification by Database Searching  Eng, J. K., McCormack, A. L. and Yates, J. R., 3rd., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5 976-89 (1994) SEQUEST

21 Protein Identification by Database Searching MS/MS Ions Search Servers on the Web Inspecthttp://proteomics.ucsd.edu/LiveSearch/ Mascothttp://www.matrixscience.com/search_form_select.html MS-Tag (Protein Prospector) http://prospector.ucsf.edu/prospector/mshome.htm Omssahttp://pubchem.ncbi.nlm.nih.gov/omssa/index.htm PepFrag (Prowl)http://prowl.rockefeller.edu/prowl/pepfrag.html PepProbehttp://bart.scripps.edu/public/search/pep_probe/search.jsp RAId_DbShttp://www.ncbi.nlm.nih.gov/CBBResearch/qmbp/RAId_DbS/index. html Sonar (Knexus)http://hs2.proteome.ca/prowl/knexus.html X!Tandem (The GPM)http://thegpm.org/TANDEM/index.html Not on-lineByonic, Crux, greylag, MassMatrix, Myrimatch, Paragon, Peaks, PepSplice, pFind, Phenyx, ProbID, ProLuCID, ProteinLynx GS, Sequest, SIMS, SpectrumMill

22 Protein Identification by Database Searching

23

24 MS/MS Ions Search Easily automated for high throughput Can get matches from marginal data Can be slow No enzyme Many variable modifications Large database Large dataset MS/MS is peptide identification Proteins by inference.

25 Protein Identification by Database Searching Search Parameters

26 Protein Identification by Database Searching Search Parameters Sequence Database

27 Protein Identification by Database Searching Search Parameters Sequence Database Swiss-Prot (~500,000 entries) High quality, non-redundant NCBInr, UniRef100 (~19,000,000 entries) Comprehensive, non-identical EST databases (>400,000,000 entries) Very large and very redundant Sequences from a single genome A consensus sequence Peptides are lost at exon-intron boundaries (Entry counts are from mid-2012)

28 Protein Identification by Database Searching Search Parameters Taxonomy Swiss-Prot 2010_08 Mammalia (mammals)=65104 Primates=26940 Homo sapiens (human)=20292 Other primates=6648 Rodentia (Rodents)=25473 Mus.=16358 Mus musculus (house mouse)=16307 Rattus=7533 Other rodentia=1582 Other mammalia=12691

29 Protein Identification by Database Searching Search Parameters Mass Tolerances Most search engines support separate mass tolerances for precursors and fragments May allow fixed units (Da, mmu) or proportional (ppm, %) Some search engines can correct for selection of 13 C peak Unless search engine performs some type of re-calibration, need to provide conservative estimate of mass accuracy, not precision This doesn’t have to be a guessing game. Run a standard, then look at the error graphs for strong matches

30 Protein Identification by Database Searching Search Parameters Enzyme can be Fully specific Non-specific (“no enzyme”) Some search engines support Limited number of missed cleavage points Semi-specific enzymes Enzyme mixtures

31 Protein Identification by Database Searching Search Parameters Common peak list formats DTA (Sequest) PKL (Masslynx) MGF (Mascot) mzData (.XML) mzML (.mzML)

32 Protein Identification by Database Searching Search Parameters Modifications Fixed / static / quantitative modifications cost nothing Variable / differential / non-quantitative modifications are very expensive

33 Protein Identification by Database Searching Search Parameters Modifications Common artefacts Carbamylation+43N-term, KUrea in digest buffer Deamidation+1NLow pH Pyro-glutamic acid-17Q at N-termLow pH Pyro-carbamidomethyl or carboxymethyl Cys +40C at N-termLow pH, delta is relative to unmodified C Oxidation+16M (many other residues also) Gels Over alkylation+57N-term, WIodacetamide Over alkylation+58N-term, WIodoacetic acid

34 Protein Identification by Database Searching Site Analysis

35 Protein Identification by Database Searching Site Analysis

36 Protein Identification by Database Searching Site Analysis AscoreBeausoleil S.A., et al. (2006) Nat. Biotechnol. 24, 1285–1292 MaxQuantCox J. & Mann M. (2008) Nat. Biotechnol. 26, 1367 - 1372 Olsen J.V., et al. (2006) Cell 127, 635–48 Inspect MS-Alignment PTMFinder Tanner S., et al. (2008) J. Proteome Res. 7, 170–181 Payne S., et al. (2008) J. Proteome Res. 7, 3373–3381 Tsur D., et al. (2005) Nat. Biotechnol. 23, 1562–1567 Tanner S., et al. (2005) Anal. Chem. 77, 4626-4639 PhosphoScoreRuttenberg B.E., et al. (2008) J. Proteome Res. 7, 3054-9 DebunkerLu B., et al. (2007) Anal. Chem. 79, 1301-10 SloMo - ETD/ECDBailey C.M., et al. (2009) J. Proteome Res. 8, 1965-71 ModifiCombSavitski M.M., et al. (2006) Mol. Cell. Proteomics 5, 935–48 Delta ScoreSavitski M. M., et al. (2010) Mol. Cell. Proteomics mcp.M110.003830

37 Site Analysis Protein Identification by Database Searching

38 Multi-pass Searches Implemented under a variety of names X!Tandem:Model refinement Mascot:Error tolerant search Spectrum Mill:Search saved hits, homology mode, unassigned single mass gap Phenyx:2-rounds Paragon:Thorough ID, fraglet-taglet

39 Protein Identification by Database Searching Scoring Score Total matches Incorrect matches Correct matches

40 Protein Identification by Database Searching Scoring Receiver Operating Characteristic

41 Protein Identification by Database Searching Sensitivity & Specificity

42 Protein Identification by Database Searching Sensitivity & Specificity Search a “decoy” database Decoy entries can be reversed or shuffled or randomised versions of target entries Decoy entries can be separate database or concatenated to target entries Gives a clear estimate of false discovery rate

43 Protein Identification by Database Searching Sensitivity & Specificity Score Total matches Incorrect matches Correct matches

44 Protein Identification by Database Searching Sensitivity & Specificity

45 Protein Identification by Database Searching Protein Inference Peptide 1Peptide 2Peptide 3 Peptide 1Peptide 3 Peptide 2 General approach is to create a minimal list of proteins. “Principal of parsimony” or “Occam’s razor” Protein A Protein B Protein C

46 Protein Identification by Database Searching Further Reading: Exercises: http://www.ms- ms.com/exercises/exercises. html


Download ppt "Protein Identification by Database Searching John Cottrell Matrix Science."

Similar presentations


Ads by Google