Mass Spectrometry
What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total molecular weight of a large sample (biomolecule) which is enough to identify substitutions, post translational modifications.
Use to BioChemists Accurate molecular weight measurements Determine sample purity Verify amino acid substitutions, post translational modifications. Amino acids sequencing Oligonucleotide structure Protein structure determination (protein folding, macromolecular structure determination)
Use in the Industry Biotechnology (Analysis of proteins, peptides, oligonucleotides) Pharmaceuticals (Drug discovery, pharmacokinetics, drug metabolism) Clinical (neonatal screening, haemoglobin analysis) Geological (Oil composition) Environmental (Water quality, food contamination)
Mass Spectrometer has… 3 parts: Ionization source Analyzer Detector
Matrix-Assisted Laser Desorption/Ionization (MALDI) From lectures by Vineet Bafna,UCSD
Protein to peptide to fragment ion Proteins are digested by using a protease like Trypsin. Trypsin breaks the protein backbone at L and R which are basic residues and form positive ions. The mass spectrometer further breaks these peptides into fragment ions.
Peptide Cleavage
Glycan Cleavage
Mass Spectrum
Experimental SpectrumTheoretical Spectrum
Peptide sequencing problem Goal: Find a peptide whose theoretical spectrum matches the given experimental spectrum the best.
How can it be done? Database search De Novo search
Database search Given a experimental spectrum and the parent mass of the experimental peptide, find candidate peptides with the same parent mass in the database that match the experimental peptide the best
De novo Search Build a spectrum graph from the masses to create the nodes Use mass differences to create the edges Find the best path
De novo SearchSequence
Database vs De Novo Search Database search is very successful in identification of already known proteins. De novo helps in identification of proteins not in the database. Database search is not as fast as De novo. De novo needs good quality spectra and without any modifications to work with. De novo is not very accurate. Database: SEQUEST (Yates et al) De Novo: PepNovo (Frank and Pevzner)
Our Project Database search tool for Peptide Identification from Mass Spectrometry data using a Machine Learning approach
What makes it different from the traditional search tools? * Traditional search tools use ad-hoc rules and or unified probabilistic models * Our tool is based on the Machine learning approach
What is Machine Learning? * An area of artificial intelligence concerned with the development of techniques which allow computers to learn from data. *The researcher feeds a set of training examples to a computer program that aims to learn the connection between features of the examples and a specified target concept.
Examples of Machine learning techniques *Linear Regression *Decision Tree learning *Artificial Neural Networks *Bayesian Learning *Analytical Learning *Reinforcement Learning etc.....
Our Choice.... Artificial Neural Networks Reason? * Peptide fragmentation is a non-linear problem governed by complex rules.
A brief overview of Neural Networks
Brain Cells
From Human Neurons to Artificial Neurons....
Feed Forward Networks
Work flow of the project
Data Preparation Protein Samples were isolated from rat brains Samples were digested with trypsin and passed through LCQ Deca Xp ion-trap mass spectrometer and spectra of peptide ions were recorded. All spectra were searched against protein sequences for Rattus in Swiss- Prot database using Mascot
Precursor peptides were divided into double and triple charged sets. Peak intensities were estimated for the following ion types precursor-H2O b, b-H2O, b-NH3, b-H2O-NH3 y, y-H2O, y-NH3, y-H2O-NH3
Network Training Features for each ion were extracted. Target is the peak intensity for each ion. Seperate ensembles of two layer feed- forward networks were constructed for each ion type and trained on the data.
Prediction of Spectrum The predicted fragment spectrum was constructed combining the outputs of individual predictors for each ion type. A blackbox was constructed from the trained Neural Network models The blackbox when presented with a peptide as input will output the predicted spectrum
Thank You Questions?