Department of Biochemistry Protein DNA Interactions From interactions to function prediction Sue Jones Department of Biochemistry University of Sussex 20th Sept 2004 EMBL Lecture Course
Outline Protein-DNA Interactions :importance Structural Data Predicting DNA Binding Function Alternative Method & New Perspectives
Protein-DNA Interactions : Importance Gene expression Transcription initiation (TATA binding protein) RNA synthesis (RNA polymerase) Transcription regulation (MAX protein) DNA repair (DNA glycosylase : oxidative DNA damage)
Protein-DNA Interactions : Importance DNA packaging (Histone H2A.e) DNA replication (Polymerases, Ligases, single stranded binding proteins)
Outline Protein-DNA Interactions :importance Structural Data Predicting DNA Binding Function Alternative Method & New Perspectives
DNA B A Z DNA has structural flexibility Structure described by Watson & Crick : B-form Feature B A Type of helix RH Diameter 2.37 2.55 Rise per bp 0.34 0.29 # bp per turn 10 11 Major groove Wide, deep Narrow, Minor groove shallow Wide, shallow B A Z
Structural Data NDB : assemble and distribute structural information about nucleic acids 2490 structures (25/08/04) Protein-DNA Complex Number Double Helix 593 Single Strand 57 http://ndbserver.rutgers.edu Berman et al., 1992. Biophys J 63 p751
Protein-DNA Interactions : Structure
Protein-DNA Interactions : characteristics Major and minor groove binding DNA-binding motifs Positively charged surface areas Size ASA : 618Å2 - 2833Å2 Conformational changes DNA bending domain movements, quaternary changes Nadassy et al., 1999 Biochemistry 38 p1999 Jones et al., 1999 J.Mol.Biol. 287 p877
Outline Protein-DNA Interactions :importance Structural Data Predicting DNA Binding Function New Perspectives
Predicting DNA Binding Function Knowing a protein’s function is essential in understanding cellular location interactions biochemical pathways potential as drug targets Prediction of protein DNA binding site given unbound protein structure electrostatic patches motifs
Predicting Function from Structure Structural genomics : filling in the gaps of protein structure space Structures solved that have low sequence identity (< 30% sequence identity) Potentially little or no fold similarity to any currently in the PDB Require algorithms to make fast & reliable function predictions
Predicting DNA Binding Function Easy to make matches between globally homologous structures Method aims to identify remote matches based on local homology of a specific motif Helix-Turn-Helix (HTH) C-terminal helix - major groove binding 1/3 DNA-binding protein families (16/54)
Catabolic Activator Protein HTH Motif Proteins Hin Recombinase (1hcr) Catabolic Activator Protein (1j59)
HTH Motif Dataflow NDB PDB Literature PFAM SMART PDB Chains NDB PDB Literature PFAM SMART 26 Hidden Markov Models PDB SAM-T99 Literature Rasmol 349 HTH Chains 227 HTH Proteins 28 HMMs 86 NI Proteins 3D-Templates 7 HREPS 29 SREPS 84 NI Proteins 232 HTH Chains 30 SREPS HTH Motif Dataflow
HTH Template Library 1ais 1hcr 1b9m 1eto 1jhg 1lmb 1orc 1hcrA160-181 1b9mA32-56 1etoA73-95 1aisB1267-1293 1jhgA68-91 1lmb331-53 1orc016-36 1jhg 1lmb 1orc
Template Scanning Scanning template library against 3D structures One template T (length n) scanned against protein P of length m, calculated optimal gapless superposition at each m-n+1 possible positions in P using RMSD Based on Kabsch (1976) Acta Cryst A. 32 p922
RMSD Distributions 1.6Å Frequency 368/8266 = 3.5% false positives 5/84 = 1.4% false negatives
Improving Template Specificity Extending templates Assessing motif accessible surface area (ASA) +2 templates 61/8264 = 0.7% false positives ASA threshold (990Å2) 38/8264 = 0.5% false positives 3 ‘false’ positives were actually real HTH proteins not previously annotated
‘New’ HTH Motif 1 DNA Methyltransferase (MGMT) 110-129 C-terminal domain ‘d’ and ‘e’ helices Site directed mutagenesis 1mgtA
‘New’ HTH Motif 2 1fy7A Histone acetyltransferase 368-388 C-terminal domain zinc finger N-terminal domain protein-protein interactions SCOP : ‘winged helix’
‘New’ HTH Motif 3 1taq 1tau Polymerase I 673-700 ‘fingers’ subdomain DNA contacts ‘O’ helix New HTH precedes ‘O’ helix
Generic Templates
Generic Templates Sequence Structure RMSD < 1.6 Full sequence HMMs (0.001) Structure RMSD < 1.6
Structural Genomics Targets Scanned template library against 30 target structures from MCSG 21-49 1LMB331-53 1695 1.3 APS048 Location Template ASA RMSD MSGC Target Isocitrate lyase regulator transcription factor. (Zhang et al., J. Biol. Chem. 2002)
Summary Method combined structural data from NDB and PDB with sequence data from PFAM and SMART Structural template library of 7 HTH motifs RMSD threshold from optimal superposition Hit rate of 88% & false positive rate of 0.5% Recognition across families Template method independent of global fold similarity Potential to identify new DNA binding HTH motifs
Online Function Prediction http://www.ebi.ac.uk/thornton-srv/databases/PDNA-pred
Outline Protein-DNA Interactions :importance Structural Data Predicting DNA Binding Function Alternative Method & New Perspectives
Alternative Statistical Model Statistical Models for discerning protein structures containing the DNA-binding HTH motif. Mclaughlin and Berman, J. Mol. Biol. 2003 p43. Decision tree model to identify key structural features geometric measurements of recognition helix (RH) & helices & beta sheets preceding and following Key features High solvent accessibility of RH Hydrophobic interaction between RH & 2nd helix preceding Predicting HTH motifs within the PDB 98% accuracy & 0.7% false positive rate Predicted new HTH motifs
Future Perspectives Extend method to other DNA binding motifs : HLH, HhH, -ribbon Using electrostatic potentials with motifs to improve method Spatial templates for proteins that don’t use discrete motifs for DNA recognition
Acknowledgements Mario Garcia Carles Ferrer Department of Energy : USA Jonathan Barker Janet Thornton Hugh Shanahan Helen Berman Mario Garcia Carles Ferrer Department of Energy : USA European Bioinformatics Institute Rutgers The State University