Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT.

Similar presentations


Presentation on theme: "Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT."— Presentation transcript:

1

2 Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

3 Approaches to Structural Motif Recognition Alignments Multiple alignments & HMMs Threading Profile methods (1D, 3D) * Statistical methods

4 Structural Motif Recognition 1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix). 2) Devise a method to determine if an unknown sequence folds as the motif or not. 3) Verification in lab.

5 Our Coiled-Coil Programs PairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995] predicts 2-stranded CCs http://theory.lcs.mit.edu/paircoil MultiCoil [Wolf, Kim, Berger, 1997] predicts 3-stranded CCs http://theory.lcs.mit.edu/multicoil LearnCoil-Histidine Kinase [ Singh, Berger, Kim, Berger, Cochran, 1998 ] predicts CCs in histidine kinase linker domains http://theory.lcs.mit.edu/learncoil LearnCoil-VMF [Singh, Berger, Kim, 1999] predicts CCs in viral membrane fusion proteins http://theory.lcs.mit.edu/learncoil-vmf

6 Long Distance Correlations In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence

7 Biological Importance of Beta Helices Surface proteins in human infectious disease: virulence factors (plants, too) adhesins toxins allergens Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease) Potential new materials

8 What is Known Solved beta-helix structures: 12 structures in PDB in 7 different SCOP families Related work: ID profile of pectate lyase (Heffron et al. ‘98) HMM (e.g., HMMER) Threading (e.g., 3D-PSSM)

9 Key Databases Solved structures: Protein Data Bank (PDB) (100’s of non-redundant structures) [www.rcsb.org/pdb/] Sequence databases: Genbank (100’s of thousands of protein sequences) [www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html] SWISSPROT (10’s of thousands of protein sequences) [www.ebi.ac.uk/swissprot]

10 Performance: On PDB: no false positives & no false negatives. Recognizes beta helices in PDB across SCOP families in cross-validation. Recognizes many new potential beta helices. Runs in linear time (~5 min. on SWISS-PROT). [Bradley, Cowen, Menke, King, Berger: RECOMB 2001] BetaWrap Program

11 Histogram of protein scores for: beta helices not in database (12 proteins) non-beta helices in PDB (1346 proteins )

12 Single Rung of a Beta Helix

13

14 3D Pairwise Correlations Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B3 T2 B2 B1

15 3D Pairwise Correlations Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B3 T2 B2 B1

16

17 Question: but how can we find these correlations which are a variable distance apart in sequence? [Tailspike, 63 residue turn]

18 Finding Candidate Wraps Assume we have the correct locations of a single T2 turn (fixed B2 & B3). Generate the 5 best-scoring candidates for the next rung. B2 B3 T2 Candidate Rung

19 Scoring Candidate Wraps (rung-to-rung) Similar to probabilistic framework plus: Pairwise probabilities taken from amphipathic beta (not beta helix) structures in PDB. Additional stacking bonuses on internal pairs. Incorporates distribution on turn lengths.

20 Scoring Candidate Wraps (5 rungs) Iterate out to 5 rungs generating candidate wraps: Score each wrap: - sum the rung-to-rung scores - B1 correlations filter - screen for alpha-helical content

21 Potential Beta Helices Toxins: Vaculating cytotoxin from the human gastric pathogen H. pylori Toxin B from the enterohemorrhagic E. coli strain O157:H7 Allergens: Antigen AMB A II, major allergen from A. artemisiifolia (ragweed) Major pollen allergen CRY J II, from C. japonica (Japanese cedar) Adhesins: AIDA-I, involved in diffuse adherence of diarrheagenic E. coli Other cell surface proteins: Outer membrane protein B from Rickettsia japonica Putative outer membrane protein F from Chlamydia trachomatis Toxin-like outer membrane protein from Helicobacter pylori

22 The Problem Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix? Very difficult: peptide synthesis (1-2 months) X-ray crystallization, NMR (>1 year) molecular dynamics Our goal: predict folded structure based on a template of positive examples.

23 Collaborators Math / CS Mona Singh Ethan Wolf Phil Bradley Lenore Cowen Matt Menke David Wilson Theo Tonchev Biologists Peter S. Kim Jonathan King Andrea Cochran James Berger Mari Milla


Download ppt "Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT."

Similar presentations


Ads by Google