Presentation is loading. Please wait.

Presentation is loading. Please wait.

The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Similar presentations


Presentation on theme: "The RCSB Protein Data Bank Teaching an Old Dog New Tricks"— Presentation transcript:

1 The RCSB Protein Data Bank Teaching an Old Dog New Tricks
Philip E. Bourne

2 A Tribute From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted Biocurator Perspectives

3 Agenda The old dog New tricks
Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

4 History of the Old Dog 1970s Community discussions about how to establish an archive of protein structures Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases 1990s Ontology defined Structural genomics begins PDB moves to RCSB 2000s wwPDB formed

5 History of the Old Dog 1970s Community discussions about how to establish an archive of protein structures Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases 1990s Ontology defined Structural genomics begins PDB moves to RCSB 2000s wwPDB formed

6 Unchanging Core Mission
Create and maintain a well-curated database of macromolecular structure data derived using experimental methods that is… Always accessible to a diverse user community worldwide Developed in collaboration with that community that will… Facilitate and support scientific research and education

7 Challenges - Scientific
More complex structures – molecular machines, complexes New methods (e.g. EM) Lack of a vocabulary to provide reductionism in complex structures Partially solved problems in analyzing structures – structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners Integrating microscopic and macroscopic views Disease relationships

8 Growth and Complexity Number of released entries Year:

9 Data Integration Primary References Derived References Some Actions
Human Proteome & Homology Models Function Coverage Target Selection CATH Domains/ Families Source Organism Browser CATH Browser SCOP Browser PFAM Display Structure SCOP PFAM Source Organism SWISS-PROT/ GenBank IDs Pubmed Enzyme Commission NCBI Taxonomy Abstract Search Enzyme Browser Reactome Gene Ontology OMIM/ Disease Genomes (NCBI Gene) Structural Genomics Targets Disease Browser Target Search Genome Browser SNPs Mapped to Structure Find Structures by SP ID GO Browsers Find Structures by GO ID NAR 2005, 33: D233-D237

10 Challenges - Technical
Sheer numbers Efficient visualization Improved annotation Demands from a more diverse user base Centralization versus decentralization Web V2

11 Diverse User Community (180,000 individuals per month) and Diversifying Further
Structural biologists Computational biologists Experimental biologists Educators Students Lay public

12 Agenda The old dog New tricks
Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

13 New Tricks – Protein Representation
The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.

14 Limitations of a Cartesian Viewpoint
A local viewpoint – does not capture the global properties of the protein Limited to a single scale descriptor Limits comparative analysis New Tricks – Protein Representation

15 Protein Kinase A – Open Book View

16 Superfamily Members – The Same But Different

17 Alignment Violates the Triangle Inequality
Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI. Poor distinguishing based on rmsd – illustrated by the breakdown of inequality Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of proteins. New Tricks – Protein Representation

18 An Alternative Approach: Multipolar Representation
Roots in spherical harmonics Parameter space and boundary conditions can be a variety of properties Order of the multipoles defines the granularity of the descriptors Bottom line – interpreted as shape descriptors Gramada & Bourne 2006 BMC Bioinformatics 7:242

19 Results – Protein Kinase Like Superfamily Alignment Scheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49
Clear distinction between families. Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level. New Tricks – Protein Representation

20 Possibilities – Structure Based Phylogenetic Analysis
Scheeff & Bourne Multipoles New Tricks – Protein Representation

21 New Tricks – Protein Motion
Structures exist in a spectrum from order to disorder Ordered Structures Disordered Structures

22 Obtaining Protein Dynamic Information Protein Structures Treated as a 3-D Elastic Network
Bahar, I., A.R. Atilgan, and B. Erman Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design, (3): p New Tricks – Protein Motion

23 Gaussian Network Model
Each Ca is a node in the network. Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å) Decompose protein fluctuation into a summation of different modes. New Tricks – Protein Motion

24 Functional Flexibility Score
Utilize correlated movements to help define regional flexibility with functional importance. Functionally Flexible Score For each residue: Find Maximum and Minimum Correlation. Use to scale normalized fluctuation to determine functional importance. Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

25 Identifying FFRs in HIV Protease
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

26 Other Examples BPTI and Calmodulin
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

27 Side Note: Gaussian Network Model vs Molecular Dynamics
GNM relatively course grained GNM fast to compute vs MD Look over larger time scales Suitable for high throughput New Tricks – Protein Motion

28 An Active Research Program Around the Resource is Good for the Resource

29 Agenda The old dog New tricks
Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

30 Single worldwide archive of macromolecular structural data
Ensures that the PDB remains a single & uniform archive publicly available to the worldwide community 3 founding members: RCSB PDB, PDBj, MSD-EBI Virtual Communities - Internal

31 wwPDB Activities Collaborative projects Remediation
taxonomy, ligands, literature Single data processing system Virtual Communities - Internal

32 Agenda The old dog New tricks
Thinking differently about proteins Virtual Communities Internal (wwPDB) External (modeling, other….) What will the resource look like in 2-5 years?

33 Virtual Communities - External
Consider the PDB a gathering point through which a virtual and real community interacts with each other around a common interest

34 Virtual Communities - External
Real Traveling art exhibit for lay audiences NJ Science Olympiad Science Expo Virtual Website Tutorials/Feedback Molecule of the Month PDB-in-a-CAVE

35 Virtual Communities - Modelers
Recommendations of Workshop PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules A central, publicly available archive (or technical equivalent thereof) or portal should be established for models It was unanimously agreed that methods for assessing model quality are essential Structure 2006 To be published

36 Agenda The old dog New tricks
Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

37 What Will the Resource Look Like in the Next 2-5 Years?
Upwards of 75,000 structures Consensus (and different) views at the micro and macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification… Community annotation cf Wikipedia Distributed subsets - External Reference Files (XML) MyPDB PDB-in-a-box Specialized visualization tools (mbt.sdsc.edu)

38 Is a database really different than a biological journal?
The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB Is a database really different than a biological journal? PloS Comp Biol (3) e34 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. Now assigning DOIs to structures 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed

39 Acknowledgements The RCSB PDB NIH, NSF, DOE Apostol Gramada
Multipole Analysis Jenny Gu Protein Motions

40 A Protein is More than the Union of its Parts
Breaking the protein into parts changes the object of the comparison This is interpreted in many cases to imply that the rmsd measure is inadequate. The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do. Transitivity is not guaraneteed From Røgen & Fain (2003), PNAS 100: New Tricks – Protein Representation

41 An Alternative Approach: Multipolar Representation Roots in Spherical Harmonics
Spatial distribution of a scalar quantity Parameterization + boundary conditions Charge distribution (i.e. structure) Scalar potential Justifies use of multipoles as a distribution of charge also geometry of spatial distribution of atoms Multipole expresses distortions in the spherical distribution Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

42 An Alternative Approach: Multipolar Representation
“Out” Multipoles For a given rank l, they form a 2l+1 dimensional vector under 3D rotations Vector algebra applies => metric properties Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

43 An Alternative Approach: Multipolar Representation
The multipoles can be interpreted as shape descriptors In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation


Download ppt "The RCSB Protein Data Bank Teaching an Old Dog New Tricks"

Similar presentations


Ads by Google