Structural Motif Recognition 1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix). 2) Devise a method to determine if an unknown sequence folds as the motif or not. 3) Verification in lab.
Long Distance Correlations In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence
Biological Importance of Beta Helices Surface proteins in human infectious disease: virulence factors (plants, too) adhesins toxins allergens Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease) Potential new materials
What is Known Solved beta-helix structures: 12 structures in PDB in 7 different SCOP families Related work: ID profile of pectate lyase (Heffron et al. ‘98) HMM (e.g., HMMER) Threading (e.g., 3D-PSSM)
Key Databases Solved structures: Protein Data Bank (PDB) (100’s of non-redundant structures) [www.rcsb.org/pdb/] Sequence databases: Genbank (100’s of thousands of protein sequences) [www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html] SWISSPROT (10’s of thousands of protein sequences) [www.ebi.ac.uk/swissprot]
Performance: On PDB: no false positives & no false negatives. Recognizes beta helices in PDB across SCOP families in cross-validation. Recognizes many new potential beta helices. Runs in linear time (~5 min. on SWISS-PROT). [Bradley, Cowen, Menke, King, Berger: RECOMB 2001] BetaWrap Program
Histogram of protein scores for: beta helices not in database (12 proteins) non-beta helices in PDB (1346 proteins )
Question: but how can we find these correlations which are a variable distance apart in sequence? [Tailspike, 63 residue turn]
Finding Candidate Wraps Assume we have the correct locations of a single T2 turn (fixed B2 & B3). Generate the 5 best-scoring candidates for the next rung. B2 B3 T2 Candidate Rung
Scoring Candidate Wraps (rung-to-rung) Similar to probabilistic framework plus: Pairwise probabilities taken from amphipathic beta (not beta helix) structures in PDB. Additional stacking bonuses on internal pairs. Incorporates distribution on turn lengths.
Scoring Candidate Wraps (5 rungs) Iterate out to 5 rungs generating candidate wraps: Score each wrap: - sum the rung-to-rung scores - B1 correlations filter - screen for alpha-helical content
Potential Beta Helices Toxins: Vaculating cytotoxin from the human gastric pathogen H. pylori Toxin B from the enterohemorrhagic E. coli strain O157:H7 Allergens: Antigen AMB A II, major allergen from A. artemisiifolia (ragweed) Major pollen allergen CRY J II, from C. japonica (Japanese cedar) Adhesins: AIDA-I, involved in diffuse adherence of diarrheagenic E. coli Other cell surface proteins: Outer membrane protein B from Rickettsia japonica Putative outer membrane protein F from Chlamydia trachomatis Toxin-like outer membrane protein from Helicobacter pylori
The Problem Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix? Very difficult: peptide synthesis (1-2 months) X-ray crystallization, NMR (>1 year) molecular dynamics Our goal: predict folded structure based on a template of positive examples.
Collaborators Math / CS Mona Singh Ethan Wolf Phil Bradley Lenore Cowen Matt Menke David Wilson Theo Tonchev Biologists Peter S. Kim Jonathan King Andrea Cochran James Berger Mari Milla