Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,

Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park

2 Sample Preparation for Peptide Identification Enzymatic Digest and Fractionation

3 Mass Spectrometer Ionizer Sample + _ Mass Analyzer Detector MALDI Electro-Spray Ionization (ESI) Time-Of-Flight (TOF) Quadrapole Ion-Trap Electron Multiplier (EM)

4 Single Stage MS MS m/z

5 Tandem Mass Spectrometry (MS/MS) Precursor selection m/z

6 Tandem Mass Spectrometry (MS/MS) Precursor selection + collision induced dissociation (CID) MS/MS m/z

7 Peptide Identification For each (likely) peptide sequence 1. Compute fragment masses 2. Compare with spectrum 3. Retain those that match well Peptide sequences from protein sequence databases Swiss-Prot, IPI, NCBI’s nr,... Automated, high-throughput peptide identification in complex mixtures

8 What goes missing? Known coding SNPs Novel coding mutations Alternative splicing isoforms Alternative translation start-sites Microexons Alternative translation frames

9 Why should we care? Alternative splicing is the norm! Only 20-25K human genes Each gene makes many proteins Proteins have clinical implications Biomarker discovery Evidence for SNPs and alternative splicing stops with transcription Genomic assays, ESTs, mRNA sequence. Little hard evidence for translation start site

10 Novel Splice Isoform

11 Novel Splice Isoform

12 Novel Frame

13 Novel Frame

14 Novel Mutation Ala2→Pro associated with familial amyloid polyneuropathy

15 Novel Mutation

16 Genomic Peptide Sequences Genomic DNA Exons & introns, 6 frames, large (3Gb → 6Gb) ESTs No introns, 6 frames, large (4Gb → 8Gb) Used by gene, protein, and alternative splicing annotation pipelines Highly redundant, nucleotide error rate ~ 1%

17 Compressed EST Database Six-frame translation of all ESTs Optionally, ESTs that map to a gene Eliminate ORFs < 30 amino-acids Amino-acid 30-mers Observed in at least two ESTs Represent AA 30-mers in C 3 FASTA database Complete, Correct, Compact

18 SBH-graph ACDEFGI, ACDEFACG, DEFGEFGI

19 Compressed SBH-graph ACDEFGI, ACDEFACG, DEFGEFGI

20 Sequence Databases & CSBH-graphs Original sequences correspond to paths ACDEFGI, ACDEFACG, DEFGEFGI

21 Sequence Databases & CSBH-graphs All k-mers represented by an edge have the same count 2 2 1 2 1

22 cSBH-graphs Quickly determine those that occur twice 2 2 1 2

23 Compressed-SBH-graph ACDEFGI 2 2 1 2

24 Compressed EST Database Gene centric compressed EST peptide sequence database 20,774 sequence entries ~8Gb vs 223 Mb ~35 fold compression 22 hours becomes 15 minutes E-values improve by similar factor! Makes routine EST searching feasible Search ESTs instead of IPI?

25 Conclusions Peptides identify more than just proteins Compressed peptide sequence databases make routine EST searching feasible cSBH-graph + edge counts + C 2 /C 3 enumeration algorithms Minimal FASTA representation of k-mer sets

26 Collaborators Chau-Wen Tseng, Xue Wu Computer Science Catherine Fenselau, Crystal Harvey Biochemistry Calibrant Biosystems Thanks to PeptideAtlas, X!Tandem

Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,

Similar presentations

Presentation on theme: "Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,

Similar presentations

Presentation on theme: "Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,"— Presentation transcript:

Similar presentations

About project

Feedback