Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson

Similar presentations


Presentation on theme: "Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson"— Presentation transcript:

1 Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

2 Structural Genomics Collaborators MCSG – Mid-west Centre for Structural Genomics SPINE – Structural Proteomics in Europe SGC – Structural Genomics Consortium

3 Structural Genomics Aims ? Pathogens and disease Human Proteins Coverage of Fold Space Automation / High Throughput

4 ~1.3m non-redundant protein sequences MRTKSPGDSKFHEI TKTPPKNQVSN S… MIVISGENVDIAELTDFLCAA… PPRIPYSMVGPCCVFLMHH… MDVVDSLFVNGSNITSACELGFENE… VYAWETAHFLDAAPKLIEWEVS… MAQQRRGGFKRRKKVDFIAANKIE… CELGFENETLFCLDRPRPSKE… MAQQRRGGFKRRKKVDFIAANKIE… MGMKKNRPRRGSLAFSPRKRAKKLVP… MQILKENASNQRFVTRESEV… MEKFEGYSEKQKSRQQYFVYPFLF… MEEFVNPCKIKVIGVGGGGSNAVNRMY… MAVTQEEIIAGIAEIIEEVTGIEP… … Proteins: known sequences and 3D structures 5,500 non-redundant structures ~260,000 homology models

5 ~10% unknown Proteins: known sequences and 3D structures 5,500 non-redundant structures Homology models 3D structures of ~16,000 carefully selected proteins

6 Protein Function Protein function has many definitions: Biochemical Function - The biochemical role of the protein e.g. serine proteaseBiochemical Function Biological Function - The role of the protein in the cell/organism e.g.digestion, blood clotting, fertilisationBiological Function

7 Function through homology Surface comparison Sequence similarity Motif searches Active Site Templates Structural Similarity HTH motifs

8 Template Methodology Use 3D templates to describe the active site of the enzyme - analogous to 1-D sequence motifs such as PROSITE, but in 3-D (Wallace et al 1997) defines a functional site search a new structure for a functional site search a database of structures for similar clusters

9 Query structure SiteSeer’s “reverse” templates 123 456 879 … 3-residue templates

10 Problems with template methods Too many hits (hundreds, thousands or even tens of thousands) Use of rmsd rarely discriminates true from false positives Local distortion in structure may give a large rmsd Top hit rarely the correct hit – even in “obvious” cases

11 An example PDB code: 1hsk UDP-N-acetylenolpyruvoylglucosamine reductase (MURB) E.C.1.1.1.158 Contains the 3D template that characterises this enzyme class Sequence identity to template’s representative structure (1mbb) is 28% Ser Arg Glu

12 Enzyme active site templates Hits for 1hsk 102. E.C.1.1.1.158 2.19Å UDP-N-acetylmuramate dehydrogenase Hit E.C number Rmsd Enzyme 1. E.C.1.3.99.2 0.76Å Acyl-CoA dehydrogenase 2. E.C.4.2.1.20 0.76Å Tryptophan synthase α-subunit 3. E.C.3.2.1.73 1.19Å Glycosyl hydrolases, family 17 4. E.C.3.2.1.73 1.21Å Glycosyl hydrolases, family 16 5. E.C.4.1.2.13 1.25Å Fructose-bisphosphate aldolase (class I) … … … 386. … 3.94Å … Arg Glu Ser rmsd=2.19Å

13 Template structure – 1mbb Comparison of template environments Arg Glu Ser Match to template: Query structure – 1hsk

14 Template structure – 1mbb Comparison of template environments Arg Glu Ser Match to template: Query structure – 1hsk

15 Template structure – 1mbb Comparison of template environments Identical residues in neighbourhood: Query structure – 1hsk

16 Template structure – 1mbb Comparison of template environments Arg Glu Ser Similar residues in neighbourhood: Query structure – 1hsk

17 Results for 1hsk 1. E.C.1.1.1.158 2.08 209.1 UDP-N-acetylmuramate dehydrogenase 2. E.C.3.2.1.14 2.13 146.0 Chitinase A chitodextrinase 1,4-beta-poly-N-acetylglucosaminidase coly-beta-glucosaminidase 3. E.C.3.2.1.17 1.92 142.4 Turkey lysozyme 4. E.C.3.2.1.17 1.89 138.7 Hen lysozyme 5. E.C.3.5.1.26 1.47 132.3 Aspartylglucosylaminidase 6. E.C.3.2.1.3 1.54 131.1 Glucan 1,4-alpha-glucosidase Hit E.C number Rmsd Score Enzyme

18 ProFunc – function from 3D structure Functional sequence motifs Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC] HTH-motifs Electrostatics Surface comparison Nests … etc Homologous structures of known function Homologous sequences of known function Template based methods Binding site identification and analysis Residue conservation analysis Function

19 Large scale analysis Created an edited version of the target database from the PDB – only those with status “In PDB” Extract all PDB codes for each Structural Genomics group Extract ‘prior’ knowledge (Header, Title, Jrnl, etc.) Find any associated GOA annotation Classify each structure by whether function is “known” “unknown” or “limited info” Run Profunc in a batch process on all codes (~560) Extract summary results from each analysis Compare to prior knowledge and estimate success

20 Number of deposits to the TargetDB by Structural Genomics group (Total of 577 unique entries) March 2004

21 PDB Blast Run query sequences against the PDB using BLAST Filtered out those matches released AFTER the query sequence Any hits are ignored from subsequent analyses Still get significant matches – why? Target selection criteria Released within months of SG target

22 InterPro Scan InterPro scan on proteins of known function Cannot “backdate” the InterPro database Essentially picking up itself

23 Function of query structure “known”

24 Limited Functional Info

25 Unknown Function

26 The Good, the Not So Good and the Ugly Three examples show the varying levels of information that can be retrieved from structures: 1. New functional assignment 2. Possible function identified 3. Function remains unknown

27 Ser-His-Asp catalytic triad of the lipases with rmsd=0.28Å (template cut-off is 1.2Å) The Good: BioH structure (MCSG) One very strong hit Experimentally confirmed by hydrolase assays Novel carboxylesterase acting on short acyl chain substrates Function Discovered

28 [FY] -x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLM]-x(2)-[LC] 70 F-T-M-Q-S-I-S-K-V-I-S-F-I-A-A-C 85 Class A: APC1040: The Not So Good: APC1040 (MCSG) Assigned as a probable glutaminase Most methods suggest  -lactamase activity No match to Prosite patterns Function being assayed

29 The Ugly: MT0777 (MCSG) Fold associated with many functions (Rossmann fold) No sequence motifs Residue conservation is poor. Template methods fail Hypothetical protein from: Methanobacterium thermoautotrophicum Function Unknown

30 Future Work Improvements to scoring system and additional templates Further utilisation of SOAP services as they become available (e.g. KEGG API service) Possible adaptation to use as part of a larger workflow or in LIMS systems (Taverna and MyGrid) More truely predictive analyses being developed (e.g. Electrostatics, ligand prediction, catalytic residue prediction)

31 (Hugh Shanahan) Detection of DNA-binding proteins (with HTH motif) using structural motifs and electrostatics ● Combine electrostatics with HTH structural templates. ● Can detect HTH DNA-binding proteins only. ● 1/3 of DNA-binding proteins families have HTH motif ● Use linear predictor as discriminant. ● Find comparable true positive rate (~80%) with more complicated methods. ● Very low (< 0.01% ) false positive rate.

32 Ligand Prediction Active Site & Ligand description/fingerprinting methods: Can active site geometry, shape, physical-chemical properties etc. be used to predict the preferred ligand class? Spherical Harmonics Hybrid Ellipsoids

33 Spherical Harmonics (Richard Morris) The computation of Legendre polynomials of high order requires a robust integration scheme Spherical t-designs

34 Hybrid Ellipsoids (Rafael Najmanovich) Every shape can be modelled by a set of hybrid ellipsoids The parameters describe location and a,b,c of the ellipsoid and a smear factor Similar parameters mean similar active sites and ligands

35 Predicting Catalytic Residues (Alex Gutteridge) Aims: To predict the location of the active site in an enzyme structure. To predict the catalytic residues of an enzyme. How? Train a neural network to identify catalytic residues. Cluster high scoring residues to find the active site.

36 Workflows and Taverna (Tom Oinn) Most procedures used now follow a workflow type scheme Taverna allows users to pick elements from services to create their own workflows for automation of complex sets of procedures. Removes the need to write complex scripts Beta 9 release available at: http://taverna.sourceforge.net/

37 Acknowledgements Janet Thornton Christine Orengo Roman Laskowski - Profunc Richard Morris – Interpro search, Spherical Harmonics Gail Bartlett, Craig Porter – Enzyme Templates Alex Gutteridge – Catalytic Residue Prediction Sue Jones – HTH motifs Hugh Shanahan – DNA binding, Electrostatics Jonathan Barker – JESS Hannes Ponstingl – PITA Rafael Najmanovich – Hybrid Ellipsoids Martin Senger, Siamak Sobhany – SOAP, Tom Oinn – Taverna Annabel Todd and Russell Marsden – UCL MCSG consortium for lots of structures, plus many more at EBI and UCL Work was supported by NIH grant (GM 62414) and by the US DoE under contract (W-31-109-Eng-38)


Download ppt "Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson"

Similar presentations


Ads by Google