De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi

De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

De novo vs database matching MS 2 spectrum Unknown glycan glycan database Database matching matching Best scoring glycan(s) in the DB Only those structures that are in the DB can be found OK if comprehensive DB If glycan not in the DB the result may be closest matching (wrong) structure or no result at all

MS 2 spectrum Unknown glycan De novo Best scoring glycans No database -> also new structures can be found ! Computational intensive, requires high quality spectra Typically no definite answer, but a set of high scoring structures. On the fly structure generation and matching

De novo structure search Part of the N-glycopeptide workflow: Joenväärä et al., N-Glycoproteomics - An automated workflow approach., Glycobiology 2008,18(4):339-349. Input: Protonated, deconvoluted MS 2 spectra Steps: 1) identification of peptides 2) identification of N-glycan compositions 3) identification of de novo N-glycan structures (branching, no linkage)

Input data Spectrum with annotated glycopeptide and glycan composition fragments.

Example data Peptide: QDQCIYNTTYLNVQR Glycan composition: 6 Hex 5 HexNac 3 NeuAc

Same data, different view: Hex NeuAc=0NeuAc=1NeuAc=2NeuAc=3 6 6 5555 0 0 0000 composition: 6 Hex 5 HexNac 3 NeuAc Glycan fragments attached to peptide Free glycans HexNAc

The puzzle All the measured fragment compositions of a unknown structure with the given total composition are known Some theoretical fragments may be missing Some measured fragments may be false What is the structure that explains best the data? ?

Solution The problem is split to two phases 1)Generation of possible structures: Structures are grown starting from N-glycan core. The population size is limited by removing structures with lowest fit with peptide+glycan fragments 2) Scoring: The set of structures are scored with full data. The final glycopeptide score is set to sum of peptide and glycan structure scores.

measured theoretical Initialization The missfit (cost) between theoretical structure and measured data is defined as the number of not matching theoretical and measured fragments. Example data: peptide + 5 Hex 4 HexNAc

Growing structures Start (core) End (final composition) add unit If population grows too large structures with highest cost are removed.

Scoring... Score is calculated as –log 10 (P), where P is the probability (binomial) that a random set of fragments would match as well or better as the ranked structure. The final glycopeptide score is sum of peptide and structure scores. highest scoring lowest scoring

Options All glycosidig bonds can be broken Unlimited number of cuts Assumptions Monosaccharide names Number of possible connections with each monosaccharide Accepted connections between monosaccharides Start structures (N-glycan cores) Max population size when growing structures

Testing with in silico generated data structure theoretical spectrum fragmentation randomly removing and adding noise fragments NeuAc=0NeuAc=1NeuAc=2NeuAc=3 Hex HexNAc peptide+glycan glycan input to the de novo algoritm randomized spectrum

Results of the in silico tests If about ½ of the theoretical fragments present => The correct structure is among the few highest scoring ones. Each mark is a result of a 100 runs.

Testing with serum sample Very complex wet lab data set, i.e. a human serum specimen Removal the high abundance proteins prior to LC-MS/MS 80 spectra with identified peptide and glycan compositions 62 spectra with putative structures Mostly typical structures Mostly small structures, large ones seems to be hard to catch

Example serum spectrum

ANT3(224,187), FIBG(78), THRB(121), A1AG1(56), FETUA(156), HPT(241), HRG(344), FIBB(394), TRFE(630), IGHA1(144), A1AT(70,107,271), { VINEX(102), HPTR(126) } FIBG(78), HRG(344), IGHA1(144)VTNC(169) IGHG1(180), IGHG2(176)IGHA1(144)A1AG1(93) IGHG2(176)IGHA1(144)CO2(621), CO3(85) IGHG2(176)IGHA1(144)CO3(85) Structures found from the serum sample

Conclusions De novo glycan structure identification of intact glycopeptides is possible High quality spectra is necessary Typically no definite answer but a few structures matching equally well => biological insight still needed if one identified structure needs to be picked

De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi

Similar presentations

Presentation on theme: "De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi

Similar presentations

Presentation on theme: "De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi"— Presentation transcript:

Similar presentations

About project

Feedback