PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest, Hungary
Protein disorder is prevalent LDR (40<) protein, % kingdom B A E
Protein disorder is functional Xie H et al. J Proteome Res. 2007, 6,
Protein disorder is important Prion proteinPrion disease CFTRCystic fibrosis Alzheimer’s -synucleinParkinson’s p53, BRCA1cancer
Functions of intrinsically disordered proteins I Entropic chains II Linkers III Molecular recognition IV Protein modifications (e.g. phosphorylation) V Assembly of large multiprotein complexes
Interactions of IDPs Complex between p53 and MDM2 Coupled folding and binding
Interactions of proteins
Coupled folding and binding Functional advantages Weak transient, yet specific interactions Post-translational modifications Flexible binding regions that can overlap Evolutionary plasticity Signaling Regulation
Coupled folding and binding Experimental difficulties Highly flexible Weak interactions Short half-life Complexes of IDPs in the PDB:~ 200 Known instances:~ 2,000 Estimated number of such interactions in the human proteome:~ 100,000
Interactions of IDPs F19, W23 and L26
p27 Cyclin-dependent kinase (Cdk) inhibitor, p27Kip1 (p27) Cell cycle regulation Binds to cdk-cyclin komplex and inhibitis their activity Fully disordered protein
p27
Partial unfolding enables the phosphorylation of Tyr88, starting a series of signaling events that leads to the beginning of S phase.
Prediction of functional regions within IDPs Disordered binding regions (ANCHOR) Linear motifs (ELM, SlimPred) Morfs (Morfpred) Specific conservation pattern
Disordered protein complexes Complex between p53 and MDM2 Interaction sites are usually linear (consist of only 1 part) enrichment of interaction prone amino acids Sequence Binding sites No need for structure, binding sites can be predicted from sequence alone
Prediction of disordered binding regions – ANCHOR What discriminates disordered binding regions? A cannot form enough favorable interactions with their sequential environment It is favorable for them to interact with a globular protein Based on simplified physical model Based on an energy estimation method using statistical potentials Captures sequential context
ANCHOR Human p53 C –terminal region
ANCHOR and linear motifs NCOA2 transcription co-activator The regions between is disordered Contains 3 receptor binding motifs: xLxxLLx (LIG_NRBOX)
LMs and Disordered Binding Regions Linear motifs and disordered binding regions often overlap Complementary approaches Prediction of disordered binding regions can help to increase likelihood of true instances
LMs and Disordered Binding Regions Linear motifs and disordered binding regions often overlap Complementary approaches Prediction of disordered binding regions can help to increase likelihood of true instances
Machine learning approaches SlimPred: trained on ELM database Morfpred: trained on short chains in complex Very small datasets Negative datasets
Conservation
Conservation patterns of linear motifs No evolutionary constraints to keep the structure Strong constraints on functional site Island-like conservation
SlimPrints Generates sequence alignments of orthologous sequences Relative conservation score per position Filters out less reliable regions Fails if sequences are too divergent, or too similar