Download presentation
Presentation is loading. Please wait.
Published byDana Pearson Modified over 9 years ago
1
Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk
2
Structural Genomics Collaborators MCSG – Mid-west Centre for Structural Genomics SPINE – Structural Proteomics in Europe SGC – Structural Genomics Consortium
3
Structural Genomics Aims ? Pathogens and disease Human Proteins Coverage of Fold Space Automation / High Throughput
4
~1.3m non-redundant protein sequences MRTKSPGDSKFHEI TKTPPKNQVSN S… MIVISGENVDIAELTDFLCAA… PPRIPYSMVGPCCVFLMHH… MDVVDSLFVNGSNITSACELGFENE… VYAWETAHFLDAAPKLIEWEVS… MAQQRRGGFKRRKKVDFIAANKIE… CELGFENETLFCLDRPRPSKE… MAQQRRGGFKRRKKVDFIAANKIE… MGMKKNRPRRGSLAFSPRKRAKKLVP… MQILKENASNQRFVTRESEV… MEKFEGYSEKQKSRQQYFVYPFLF… MEEFVNPCKIKVIGVGGGGSNAVNRMY… MAVTQEEIIAGIAEIIEEVTGIEP… … Proteins: known sequences and 3D structures 5,500 non-redundant structures ~260,000 homology models
5
~10% unknown Proteins: known sequences and 3D structures 5,500 non-redundant structures Homology models 3D structures of ~16,000 carefully selected proteins
6
Protein Function Protein function has many definitions: Biochemical Function - The biochemical role of the protein e.g. serine proteaseBiochemical Function Biological Function - The role of the protein in the cell/organism e.g.digestion, blood clotting, fertilisationBiological Function
7
Function through homology Surface comparison Sequence similarity Motif searches Active Site Templates Structural Similarity HTH motifs
8
Template Methodology Use 3D templates to describe the active site of the enzyme - analogous to 1-D sequence motifs such as PROSITE, but in 3-D (Wallace et al 1997) defines a functional site search a new structure for a functional site search a database of structures for similar clusters
9
Query structure SiteSeer’s “reverse” templates 123 456 879 … 3-residue templates
10
Problems with template methods Too many hits (hundreds, thousands or even tens of thousands) Use of rmsd rarely discriminates true from false positives Local distortion in structure may give a large rmsd Top hit rarely the correct hit – even in “obvious” cases
11
An example PDB code: 1hsk UDP-N-acetylenolpyruvoylglucosamine reductase (MURB) E.C.1.1.1.158 Contains the 3D template that characterises this enzyme class Sequence identity to template’s representative structure (1mbb) is 28% Ser Arg Glu
12
Enzyme active site templates Hits for 1hsk 102. E.C.1.1.1.158 2.19Å UDP-N-acetylmuramate dehydrogenase Hit E.C number Rmsd Enzyme 1. E.C.1.3.99.2 0.76Å Acyl-CoA dehydrogenase 2. E.C.4.2.1.20 0.76Å Tryptophan synthase α-subunit 3. E.C.3.2.1.73 1.19Å Glycosyl hydrolases, family 17 4. E.C.3.2.1.73 1.21Å Glycosyl hydrolases, family 16 5. E.C.4.1.2.13 1.25Å Fructose-bisphosphate aldolase (class I) … … … 386. … 3.94Å … Arg Glu Ser rmsd=2.19Å
13
Template structure – 1mbb Comparison of template environments Arg Glu Ser Match to template: Query structure – 1hsk
14
Template structure – 1mbb Comparison of template environments Arg Glu Ser Match to template: Query structure – 1hsk
15
Template structure – 1mbb Comparison of template environments Identical residues in neighbourhood: Query structure – 1hsk
16
Template structure – 1mbb Comparison of template environments Arg Glu Ser Similar residues in neighbourhood: Query structure – 1hsk
17
Results for 1hsk 1. E.C.1.1.1.158 2.08 209.1 UDP-N-acetylmuramate dehydrogenase 2. E.C.3.2.1.14 2.13 146.0 Chitinase A chitodextrinase 1,4-beta-poly-N-acetylglucosaminidase coly-beta-glucosaminidase 3. E.C.3.2.1.17 1.92 142.4 Turkey lysozyme 4. E.C.3.2.1.17 1.89 138.7 Hen lysozyme 5. E.C.3.5.1.26 1.47 132.3 Aspartylglucosylaminidase 6. E.C.3.2.1.3 1.54 131.1 Glucan 1,4-alpha-glucosidase Hit E.C number Rmsd Score Enzyme
18
ProFunc – function from 3D structure Functional sequence motifs Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC] HTH-motifs Electrostatics Surface comparison Nests … etc Homologous structures of known function Homologous sequences of known function Template based methods Binding site identification and analysis Residue conservation analysis Function
19
Large scale analysis Created an edited version of the target database from the PDB – only those with status “In PDB” Extract all PDB codes for each Structural Genomics group Extract ‘prior’ knowledge (Header, Title, Jrnl, etc.) Find any associated GOA annotation Classify each structure by whether function is “known” “unknown” or “limited info” Run Profunc in a batch process on all codes (~560) Extract summary results from each analysis Compare to prior knowledge and estimate success
20
Number of deposits to the TargetDB by Structural Genomics group (Total of 577 unique entries) March 2004
21
PDB Blast Run query sequences against the PDB using BLAST Filtered out those matches released AFTER the query sequence Any hits are ignored from subsequent analyses Still get significant matches – why? Target selection criteria Released within months of SG target
22
InterPro Scan InterPro scan on proteins of known function Cannot “backdate” the InterPro database Essentially picking up itself
23
Function of query structure “known”
24
Limited Functional Info
25
Unknown Function
26
The Good, the Not So Good and the Ugly Three examples show the varying levels of information that can be retrieved from structures: 1. New functional assignment 2. Possible function identified 3. Function remains unknown
27
Ser-His-Asp catalytic triad of the lipases with rmsd=0.28Å (template cut-off is 1.2Å) The Good: BioH structure (MCSG) One very strong hit Experimentally confirmed by hydrolase assays Novel carboxylesterase acting on short acyl chain substrates Function Discovered
28
[FY] -x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLM]-x(2)-[LC] 70 F-T-M-Q-S-I-S-K-V-I-S-F-I-A-A-C 85 Class A: APC1040: The Not So Good: APC1040 (MCSG) Assigned as a probable glutaminase Most methods suggest -lactamase activity No match to Prosite patterns Function being assayed
29
The Ugly: MT0777 (MCSG) Fold associated with many functions (Rossmann fold) No sequence motifs Residue conservation is poor. Template methods fail Hypothetical protein from: Methanobacterium thermoautotrophicum Function Unknown
30
Future Work Improvements to scoring system and additional templates Further utilisation of SOAP services as they become available (e.g. KEGG API service) Possible adaptation to use as part of a larger workflow or in LIMS systems (Taverna and MyGrid) More truely predictive analyses being developed (e.g. Electrostatics, ligand prediction, catalytic residue prediction)
31
(Hugh Shanahan) Detection of DNA-binding proteins (with HTH motif) using structural motifs and electrostatics ● Combine electrostatics with HTH structural templates. ● Can detect HTH DNA-binding proteins only. ● 1/3 of DNA-binding proteins families have HTH motif ● Use linear predictor as discriminant. ● Find comparable true positive rate (~80%) with more complicated methods. ● Very low (< 0.01% ) false positive rate.
32
Ligand Prediction Active Site & Ligand description/fingerprinting methods: Can active site geometry, shape, physical-chemical properties etc. be used to predict the preferred ligand class? Spherical Harmonics Hybrid Ellipsoids
33
Spherical Harmonics (Richard Morris) The computation of Legendre polynomials of high order requires a robust integration scheme Spherical t-designs
34
Hybrid Ellipsoids (Rafael Najmanovich) Every shape can be modelled by a set of hybrid ellipsoids The parameters describe location and a,b,c of the ellipsoid and a smear factor Similar parameters mean similar active sites and ligands
35
Predicting Catalytic Residues (Alex Gutteridge) Aims: To predict the location of the active site in an enzyme structure. To predict the catalytic residues of an enzyme. How? Train a neural network to identify catalytic residues. Cluster high scoring residues to find the active site.
36
Workflows and Taverna (Tom Oinn) Most procedures used now follow a workflow type scheme Taverna allows users to pick elements from services to create their own workflows for automation of complex sets of procedures. Removes the need to write complex scripts Beta 9 release available at: http://taverna.sourceforge.net/
37
Acknowledgements Janet Thornton Christine Orengo Roman Laskowski - Profunc Richard Morris – Interpro search, Spherical Harmonics Gail Bartlett, Craig Porter – Enzyme Templates Alex Gutteridge – Catalytic Residue Prediction Sue Jones – HTH motifs Hugh Shanahan – DNA binding, Electrostatics Jonathan Barker – JESS Hannes Ponstingl – PITA Rafael Najmanovich – Hybrid Ellipsoids Martin Senger, Siamak Sobhany – SOAP, Tom Oinn – Taverna Annabel Todd and Russell Marsden – UCL MCSG consortium for lots of structures, plus many more at EBI and UCL Work was supported by NIH grant (GM 62414) and by the US DoE under contract (W-31-109-Eng-38)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.