Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elham Sherafat and Ion Mandoiu

Similar presentations


Presentation on theme: "Elham Sherafat and Ion Mandoiu"— Presentation transcript:

1 PU-Caller: Sensitive somatic variant calling using positive-unlabeled learning
Elham Sherafat and Ion Mandoiu Computer Science & Engineering Department University of Connecticut

2 Outline Motivation Positive-unlabeled learning Results Ongoing work

3 Ongoing Ovarian Cancer Immunotherapy Trial
Bulk exome & RNA sequencing LC-MS/MS of eluted peptides GeNeo suite of Galaxy tools neo.engr.uconn.edu Somatic variant validation and clonal analysis by targeted DNA sequencing using AccessArray Neoantigen prediction Peptide vaccine

4 Consensus Caller Cross-Platform

5 AccessArray Validation

6 Somatic Mutation Prevalence
Goal: use machine learning to increase sensitivity without much loss of precision (AccessArray capacity is bounded)

7 Previous Work Supervised ML approaches
Need large amounts of training data Assume matched distributions between training and test data

8 Outline Motivation Positive-unlabeled learning Results Ongoing work

9 PU Learning Input: 10s-100s of high confidence SNVs from CCCP/2CP (“positives”) of SNV candidates that fail 2CP filter (“unlabeled”) Two-step approach: Infer “reliable negatives’’ from unlabeled data Train classifier using positives and reliable negatives, then classify all points Robust to patient-to-patient variability

10 PU-Caller Workflow Robustness increased by
Informed undersampling to balance reliable negatives with positives Use of “spy” positives for threshold selection Bootstrapping

11 Outline Motivation Positive-unlabeled learning Results Ongoing work

12 AccessArray Validation
PU-Caller yields 7-17% increase in validated SNVs compared to CCCP/2CP

13 SNV Feature Importance

14 Outline Motivation Positive-unlabeled learning Results Ongoing work

15 Ongoing Work PU learning technique is broadly applicable
Currently using PU learning for improving sensitivity of peptide identification from LC-MS/MS data MS-GF+ database search engine generates 1000s of confident identifications, but leaves 10,000s of spectra unmatched

16 Acknowledgments Elham Sherafat Jordan Force


Download ppt "Elham Sherafat and Ion Mandoiu"

Similar presentations


Ads by Google