Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com.

Similar presentations


Presentation on theme: "Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com."— Presentation transcript:

1 Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com

2 Beyond the realm of manual interpretationBeyond the realm of manual interpretation How do we determine what is a valid protein identification?How do we determine what is a valid protein identification? Shotgun proteomics  Analysis of complex mixtures 1.2 Million Spectra!!! Whole cell extract 10,000+ proteins 600,000 peptides Scaffold: Why do we need it?

3 Statistical Analysis Using Scaffold All search engines use different scoring All search engines use different scoring algorithms  Can not directly compare results algorithms  Can not directly compare results Many search engines results are described by Many search engines results are described by more than one value more than one value Examples: Mascot  Ion Score and Identity Score Sequest  Xcorr and DeltaCn

4 Peptide Prophet* Creates a universal score (discriminant score) for the search Creates a universal score (discriminant score) for the search engine result (e.g. XCorr and DeltaCn are compressed to one engine result (e.g. XCorr and DeltaCn are compressed to one score for SEQUEST results, Ion score and Identity score for score for SEQUEST results, Ion score and Identity score for Mascot results) Mascot results) Plots a histogram of the discriminant scores and Plots a histogram of the discriminant scores and calculates a bimodal distribution based on standard calculates a bimodal distribution based on standard statistics to differentiate between correct and incorrect hits statistics to differentiate between correct and incorrect hits Computes the probability that the match is correct at a Computes the probability that the match is correct at a given discriminant score given discriminant score *Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75, 4646-4658 Statistical Analysis Using Scaffold

5 0 20 40 60 80 100 120 140 160 180 200 -3.9-2.3-0.70.92.54.15.77.3 Discriminant score (D) Number of spectra in each bin Histogram of discriminate scores Statistical Analysis Using Scaffold

6 Assumes a mixture of standard statistical distributions “incorrect” “correct” Statistical Analysis Using Scaffold

7 “incorrect” “correct” Peptide Probability Threshold Statistical Analysis Using Scaffold

8 9% 19%7% 34% 5% 4%22% SEQUEST X!Tandem One Search Engine may not be enough Mascot Statistical Analysis Using Scaffold www.proteomesoftware.com

9 Peptide Prophet statistics are applied separately for each search engine result (i.e. Mascot, SEQUEST, each search engine result (i.e. Mascot, SEQUEST, and X!Tandem) and X!Tandem) Scaffold Merger combines the peptide probabilities Scaffold Merger combines the peptide probabilities from each search engine to generate a protein from each search engine to generate a protein probability probability The probability of identifying a spectrum + The probability of agreement between search engines Protein Probability Statistical Analysis Using Scaffold

10 Advantages using of Scaffold Allows you to choose a statistical error rate by setting probability thresholds Allows you to choose a statistical error rate by setting probability thresholds Allows you to compare and combine results from different experiments and different search engines Allows you to compare and combine results from different experiments and different search engines Allows sharing of raw data and search results Allows sharing of raw data and search results Accepted as a suitable statistical method to validate large datasets Accepted as a suitable statistical method to validate large datasets Statistical Analysis Using Scaffold

11 This is the Samples view

12 List of all the proteins found in your samples Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries

13 General Rule  Explain the spectral data with the smallest set of proteins A B Protein A and Protein B share all the same peptides so they will be grouped together How does Scaffold Deal with peptides that can be assigned to more than one protein?

14 General Rule  Explain the spectral data with the smallest set of proteins Protein A and protein B each have one unique peptide  they will be listed separately only if the peptide probability is > 50% How does Scaffold Deal with peptides that can be assigned to more than one protein? A B

15 How does Scaffold Deal with peptides that can be assigned to more than one protein? General Rule  Explain the spectral data with the smallest set of proteins Protein B has two unique peptides  it will be listed separately A B

16 Scaffold will extract GO terms from NCBI annotations

17 Gene Ontology “GO” terms Controlled vocabulary containing consistent descriptions of gene products in different descriptions of gene products in different databases databases Describe gene products in terms of their Describe gene products in terms of their associated biological processes, cellular associated biological processes, cellular components and molecular functions in a species components and molecular functions in a species independent manner independent manner Gene Ontology Projecthttp://www.geneontology.org/GO.doc.shtmlhttp://www.geneontology.org/GO.doc.shtml

18 List of samples

19 Color coded to represent probability that protein identification is correct Color coded to represent probability that protein identification is correct Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined

20 This is the Proteins view

21 Spectrum of each peptide labeled with y and b ions which can be used for manual validation Spectrum of each peptide labeled with y and b ions which can be used for manual validation

22 Manual Spectrum Evaluation Search engine scores  Is peptide found by more Search engine scores  Is peptide found by more than one search engine? than one search engine? Mascot ion score > 40 SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion) deltaCn > 0.2 deltaCn > 0.2 Good signal-to-noise Good signal-to-noise Long stretches of y and/or b ions Long stretches of y and/or b ions All dominant peaks are assigned as y or b ions All dominant peaks are assigned as y or b ions Fragmentation chemistry Fragmentation chemistry N-terminal cleavage at P  dominate y-ion C-terminal cleavage at D and E  dominate b-ion Peptides containing W  abundant y-ions S and T  tend to lose water (-18 Da) R, N, and Q  tend to lose ammonia (-17 Da)

23 Peptide Sequence  IAELAGFSVPENTK +2 charge on parent peptide Good Spectrum SEQUEST: Xcorr = 2.61 deltaCn = 0.4 deltaCn = 0.4 Dominant y-ion at N-terminal cleavage of P Mascot: Ion Score = 60.1 Identify Score = 37.3 Identify Score = 37.3 Good coverage of y and b ion series Good signal-to-noise

24 Bad Spectrum Peptide Sequence  YPLADYALTPDMAIVDANLVMDMPK +3 charge on parent peptide SEQUEST: Xcorr = 2.26 deltaCn = 0.2 deltaCn = 0.2 Mascot: Ion Score = 9.93 Identity Score = 37.3 Identity Score = 37.3 Poor signal-to-noise Poor coverage of y and b ion series Multiple unassigned peaks

25 This is the Statistics view

26 Score Histogram Blue indicates “incorrect” proteins Protein is “correct” if it passes the peptide and protein probability and minimum # peptide filters probability and minimum # peptide filters. Scaffold Statistics View Red indicates “correct” proteins Important! Must have enough data to fit two distributions for the statistics to be valid.

27 Scaffold Statistics View With only 1 unique peptide (95% peptide prob) the maximum protein probability is <90%. With at least 2 unique Peptides (95% peptide prob) the maximum protein probability is ~100%.

28 SEQUEST only Scaffold Statistics View Missed IDs

29 Mascot only Scaffold Statistics View Missed IDs

30 Scaffold Statistics View Using both Mascot and Sequest results in more “correct” protein identifications Mascot only Sequest only Both

31 This is the Publish View

32 http://www.mcponline.org/misc/ParisReport_Final.shtml Journal of Molecular and Cellular Proteomics Publication Guidelines for Proteomic Data

33 Name and version of software used to extract peak list Name and version of software used to extract peak list Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) Values of all search parameters used (enzyme, modifications, mass tolerance, etc.) Values of all search parameters used (enzyme, modifications, mass tolerance, etc.) Name and size of the database searched (Swisprot or NCBI and the number of sequence entries) Name and size of the database searched (Swisprot or NCBI and the number of sequence entries) Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings) Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings) Data Analysis Publication Guidelines for Proteomic Data

34 Publication Guidelines for Proteomic Data Each Protein Identified Accession number Sequence coverage and total number of unique peptides Sequence coverage and total number of unique peptides Each Peptide Identified Peptide sequence noting any modifications or missed cleavages Peptide sequence noting any modifications or missed cleavages Parent peptide ion mass and charge Parent peptide ion mass and charge All search engine scores All search engine scores


Download ppt "Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com."

Similar presentations


Ads by Google