Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer.

Similar presentations


Presentation on theme: "1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer."— Presentation transcript:

1 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer Research Center Washington University New York University CPTAC Data Jamboree April 16, 2014 National Institutes of Health Bethesda, Maryland

2 Tumor-specific protein databases for MS/MS-spectra searches Kelly Ruggles, David Fenyo, NYU

3 QUILTS: Treatment of different variant types Novel Novel downstream: 1 frame translation Novel upstream: 6 frame translation 1 frame translation Unannotated Alternative Splicing Partially Novel Splicing Completely Novel Expression Fusion Genes Variants In frameshifts db In alternates frameshifts In variants db In other db

4 Proteogenomic mapping: Genetic alterations can be observed on protein level (105 tumors) | work in progress | Low thresholds applied to Genome calls (>1 read RNA-seq, >2 QUAL phred-scaled Variants) High thresholds applied to Proteome calls (<0.1% FDR) % of frameshifts, alternative splices & single AA variants observable by proteomics mRNA may not be translated or at low abundance Proteome coverage is incomplete  

5 1 mg total protein per tumor Internal reference: equal representation of basal, Her2 and Luminal A/B subtypes Global proteome and phosphoproteome discovery workflow for TCGA breast tumors

6 Serial Search Strategy with Personalized Databases > Canonical Protein SIGNALINGPATHWAYREGULATOR 25,776,160 Spectra (105 patients) (36 iTRAQ experiments) (25 LC-MS/MS runs / experiment) RefSeq-Hs-7/2013: 31,852 11,328,955 Matched Spectra (44% of total) (1% FDR) 3247 Variants Matched 197 Splice Junctions Matched 14,447,205 Leftover Spectra Concatenated FASTA files, 105 patients Altered proteins Removed redundant entries > Canonical – Variant Patient 1 SIGNALINGPATHWAHREGULATOR >Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR Variants: 133,241 > Canonical – Alternate splice Patient 1 SIGNALINGREGULATOR >Canonical – Alternate splice Patient 2 SIGNALINGPATHREGULATOR Alternate Spliceforms: 67,853 Low confidence thresholds for Genome calls Variants:>2 QUAL score (phred-scaled) Alternative splices, frameshifts:>1 read Concatenated: 252,890 > Canonical – Truncation Patient 1 SIGNALINGPATFRAMESHIF >Canonical – Novel Exon Insert Patient 2 SIGNALINGPATHWAYINSERTREGULATOR >Canonical – Partial Exon Deletion Patient 3 SIGNALINGPATHWAYULATOR Frameshifts: 19, Truncation Overlaps Matched 11 Insertion Overlaps Matched 49 Deletion Junctions Matched High confidence for Proteome IDs <0.1% FDR peptide spectrum match

7 Frequency of Single AA Variants, Alternative Splices, Frameshifts Across Patients very common Somatic variants are less frequent than germline variants Some germline variants are very common Rare germline variants present in RefSeq Some alternative splice forms and frameshifts are very common Should be in RefSeq Genome & Transcriptome Data

8 Max # Reads 17 observed in >1 Expmt How many RNA-seq reads to yield a proteomics observation of an alternate splice or frameshift? 1 experiment: 3 individual patients + 1 Common control (40 patients) 82 Frameshifts197 Alternative splices Max # Reads 19 observed in >1 Expmt

9 Frameshift Truncation: ras-Related protein Rab-15 Observed only in Proteomics Exp 3 9 E159 Max RNA-Seq Reads: 1 Present in only 1 Common control member

10 Frameshift Truncation: Cysteine-rich protein 1 Observed in 9 Proteomics Experiments 10 E159 Max RNA-Seq Reads: 1 Present in only 1 Common control member

11 Frameshift Truncation: Cullin-2 isoform a Observed in 3 Proteomics Experiments 11 E159 Max RNA-Seq Reads: 1 Present in only 1 Common control member

12 Many missing observations even when transcript present in many common control members 1 experiment: 3 individual patients + 1 Common control (40 patients) FrameshiftsAlternative splices

13 Majority of Alternative Splice Junctions and Frameshifts observed in >1 Proteomics Experiment 1 experiment: 3 individual patients + 1 Common control (40 patients) FrameshiftsAlternative splices 150/197 observed in >1 experiment44/82 observed in >1 experiment Pie chart

14 Next steps: Examine “other” category –Fusion genes (junction-spanning) –Novel exon splicing (2 sides) –Completely novel gene Use updated somatic variants from QUILTS Define genomic data thresholds suitable for proteomic observations –RNA-seq: Min read count –Variant calling: phred-scaled QUAL score –Sort out Germline/Somatic variant call mix status across patients

15 Summary of Proteome Re-processing 105 TCGA patients- 36 iTAQ experiments 15

16 Karl Clauser Proteomics and Biomarker Discovery Changes in Re-processing of TCGA data 16 Extraction Centroiding Use Xcalibur, instead of SM. iTRAQ ratios are little changed, intensities lower by ~5x (will more closely match NIST central analysis pipeline) Precursor MH+ range expanded from to Searches Replace database with RefSeq version used as reference for the personalized database generation. database content/size very similar, protein identifiers change from gi numbers to RefSeq numbers. Allowed modifications will be expanded. Increases the # of identified spectra by ~10%. From Full iTRAQ, M-ox, N-deam, q-pyro To iTRAQ-Full-Lys-only, M-ox, N-deam, q-pyro, c-pyro, Ac-nTermProt Autovalidation Proteome initial processing, peptide FDR per experiment : %, but overall peptide FDR across all 36 experiments: ~5.5% Phosphoproteome initial processing, peptide FDR per experiment : % but overall peptide FDR across all 36 experiments: ~7.2%. Changes will seek to bring the overall peptide FDR’s down to ~1% require multiple observations (protein, P-site) across experiments raise score thresholds Quantitation Will use PIP(precursor ion purity) filtering to exclude from quantitation but not identification. PIP > 50% excludes ~7.8% of spectra. Filtering reduces standard deviations on protein & phosphosite level iTRAQ ratios

17 Y Chromosome Frameshift - CD99 antigen Observed in 36 Proteomics Experiments 17 Partial exon deletion splice, plus frameshift truncation E159 Max RNA-Seq Reads: 12 Transcript present in 18/40 Common Control Members

18 Acknowledgments Washington U./MD Anderson/NYU -Sherri Davies -Matthew Ellis -David Fenyo -Kelly Ruggles -Reid Townsend -Li Ding Broad Institute/FHCRC -Steve Carr -Karl Clauser -Michael Gillette -Jana Qiao -Philipp Mertins -DR Mani -Eric Kuhn -Sue Abbatiello -Amanda Paulovich -Pei Wang -Sean Wang -Ping Yan NCI Staff -Emily Boja -Mehdi Mesri -Rob Rivers -Chris Kinsinger -Henry Rodriguez Funding -National Cancer Institute

19 Single AA Variants may be Somatic in Some Patients, Germline in Others Genomic Proteomic Highly Interesting, should correlate with prognosis and/or subtype. May correlate with prognosis? Might as well be canonical isoforms? Detectable, but too rare to indicate biology. G&S mix genomic variants have the highest observation rate by Proteomics. Genomic variants present in only a single patient are observable by Proteomics 81 Patients Nov 2013

20 Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity? Is there some cancer biology involved for high S/G ratio variants? Are patients with germline form more cancer prone? Does somatic form correlate with prognosis, development of drug-resistance? Genomic Proteomic 81 Patients Nov 2013

21 Wide Range of Somatic Single AA Variants/Patient Low confidence thresholds applied to calls Variants:>2 QUAL score (phred-scaled) Alternative splices:>1 read Skip


Download ppt "1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer."

Similar presentations


Ads by Google