Presentation is loading. Please wait.

Presentation is loading. Please wait.

Haruki Nakamura Institute for Protein Research, Lab. Protein Informatics, Osaka University PRAGMA 11 Workshop, Osaka Univ., on October.

Similar presentations


Presentation on theme: "Haruki Nakamura Institute for Protein Research, Lab. Protein Informatics, Osaka University PRAGMA 11 Workshop, Osaka Univ., on October."— Presentation transcript:

1 Haruki Nakamura Institute for Protein Research, Lab. Protein Informatics, Osaka University PRAGMA 11 Workshop, Osaka Univ., on October 16, 2006 Integrated Web Service for Database Queries and Computations Grid Web Services for Analog Queries to Protein 3D Structures

2 BioGrid at Osaka Started from 2002 Goals:Grid Technology Development for: Biotechnology (drug discovery) and Medical Sciences. Leader: Shinji Shimojo (CMC, Osaka Univ.) Government Support (MEXT): 5years H. Nakamura K. Yamaguchi T. Takada H. Matsuda T. Akiyama S. Shimojo S. Date Integration of Data Grid and Computing Grid to access Integrated and Standardized Databases with High-performance Computers, from anywhere and with low cost.

3 Less Conceputal (Formal) Highly Conceptual Fine Scale Conceptual Level Large Literature 3D- Cordinates Function Sequence Molecular Interaction Network Atom Molecule Organelle Cell Tissue Organ Organism Mass Analysis Medical Drug Data Sugar Chain Swiss-Prot MEDLINE KEGG OMIM Gene Ontology PDB InterPro, Pfam MDDR DIP Gene Expression Chemical Structure PubChem, Ligand, ChEBI DDBJ/EMBL/ GenBank H-Inv, FANTOM GEO Databases in Life Sciences nearly 1,000 DBs

4 BioDataGrid by Hideo Matsuda at Osaka Univ.

5 BioGrid at Osaka Started from 2002 Goals:Grid Technology Development for: Biotechnology (drug discovery) and Medical Sciences. Leader: Shinji Shimojo (CMC, Osaka Univ.) Government Support (MEXT): 5years H. Nakamura K. Yamaguchi T. Takada H. Matsuda T. Akiyama S. Shimojo S. Date Integration of Data Grid and Computing Grid to access Integrated and Standardized Databases with High-performance Computers, from anywhere and with low cost.

6 BioPfuga : Biosimulation Platform United on Grid Architecture A platform where individual application programs at the different levels are united to execute a hybrid computation. In particular, BioPfuga is a platform for biosimulation. Nakamura et al. (2004) New Generation Computing, 22,

7 Construction of BioPfuga Propose and Design Communication between program components by XML description (BMSML: BioMolecular Simulation-ML), and Creation of the APIs to handle it. Divide programs into a set of many components Different program modules are then implemented as the service programs based on the OGSA (Open Grid Service Architecture) mechanism.

8

9 Construction of BioPfuga Propose and Design Communication between program components by XML description (BMSML: BioMolecular Simulation-ML), and Creation of the APIs to handle it. Divide programs into a set of many components Different program modules are then implemented as the service programs based on the OGSA (Open Grid Service Architecture) mechanism.

10 Quantum mechanics (QM) and Molecular mechanics (MM) simulation coupled on BioPfuga Coupled simulation on Grids QM area Active site of an enzyme Ligand MM area H total = H QM (x,x) + H QM/MM (x,y) + H MM (y,y) Hamiltonian of total system Calculate in QM area Calculate in MM area (Yonezawa et al. at SC2004)

11 developed by NEC Quantum Chemistry Group Rapid computation for huge molecular systems (20,000 basis for RHF, 1,000 basis for CASSCF and MP2), with high parallel performance Simulation of Electronic structure (1) AMOSS Ab initio Molecular Orbital Simulation for Supercomputer developed by Yoshifumi Fukunishi at JBIRC-AIST and Haruki Nakamura at Institute for Protein Research, Osaka Univ. Simulation of Protein-solvent structure prestoX-basic Protein Engineering Simulator eXtended –basic versin- Simulation of Electronic structure (2) GSO-X generalized spin density function theory (DFT) developed by Shusuke Yamanaka & Kizashi Yamaguchi, at Osaka Univ. 856 atoms, 8672 AOs PKC -C1B

12 A hybrid-QM/MM with BioPfuga on the Grid Client controller BMSML Site-C QM(DFT ) GSO-X Service GT3 Osaka MPI 50CPU Site-B AMOSS Service GT3 AMOSS(HF) QM(HF) MPI 30CPU Viewer portal Site-A MM(MD) prestoX Service GT3 prestoX (MPI) Osaka BMSML 8CPU 8-boards (Special purpose computing board: max 50GFLOPS)

13 Configuration of a computing system on a Grid environment with several different program modules. Portal/ Client Program Service QM(HF) Service QM (DFT) Service MM User Communicate using SOAP Distributed computing System Personal user SOAP

14 BioGrid at Osaka Started from 2002 Goals:Grid Technology Development for: Biotechnology (drug discovery) and Medical Sciences. Leader: Shinji Shimojo (CMC, Osaka Univ.) Government Support (MEXT): 5years H. Nakamura K. Yamaguchi T. Takada H. Matsuda T. Akiyama S. Shimojo S. Date Integration of Data Grid and Computing Grid to access Integrated and Standardized Databases with High-performance Computers, from anywhere and with low cost. Our final goal is Integration of Data Grid and Computing Grid.

15 Portal/ Client Program Service A Service C Service B User GT Communicate using SOAP Distributed Computing System Personal user SOAP site-Asite-Bsite-C GRID computing services H. Nakamura, BioGrid2005 (March 9, 2005) Integration of computing systems and Database query systems on a Grid

16 Portal/ Client Program Service A Service C Service B User GT Communicate using SOAP Distributed Computing and DB System Personal user SOAP site-Asite-Bsite-C GRID computing servicesDatabase query services H. Nakamura, BioGrid2005 (March 9, 2005) Integration of computing systems and Database query systems on a Grid

17 Figure 1. A Schematic for the Proposed Portal: The portal is envisioned to be a modular web services architecture achieved by using an implementation of the Simple Object Access Protocol (SOAP that allows for seamless data exchange between the portal and all registered contributors. SOAP is a light weight protocol for exchange of information in a decentralized, distributed environment. It is an XML-based protocol, which defines a framework for representing remote procedure calls and responses. The XML wrappers basically map the individual contributing metadata format into an XML format that the data portal understands (Drinkwater et al., 2004). These services will be platform and language independent allowing other services (other portals or clients) to communicate with the data portal (see Proposed Portal for Multiple Databases for Protein Structures Berman, H. et al. (2006) Structure, 14, Theoretical Model DB 1 Theoretical Model DB 2 Theoretical Model DB 3

18 PDB (Protein Data Bank): Protein Tertiary Structure Database Atom species and coordinates, amino acid residues, any other experimental results with raw data, experimental methods, and the conditions. X-ray crystallography, NMR, and Electron microscope experiments. Protein Tertiary Structure

19 Kim Henrick, Helen Berman, Haruki Nakamura

20 International collaboration in wwPDB (Berman, Henrick & Nakamura (2003) Nat. Struct. Biol. 10, 980) Rutgers Univ. PDBj EBI RCSB 1) Curation, data processing, and registration are made by all the members, collaborating with each other. 2) We have a single data archive, which is looked after by one archive keeper (RCSB). 3) Data format and new descriptions will be discussed among the members. 4) Members are encouraged to develop their own browsers, viewers, and other APIs and services.

21 Protein Data Bank Japan At Institute for Protein Research, Osaka Univ. since 2001 assisted from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST).

22 Processed data numbers at PDBj We process more than 30 % deposited data of the entire world, mainly from Asian and Oceania regions year Yearly wwPDB registration number Total wwPDB registration number Yearly PDBj registration number Total registration number Yearly registration number

23 Maintain Format Standards PDB (conventional and flat) PDB Exchange (mmCIF) –Mechanism for extension based on new demands PDBML –Derived from mmCIF –All entries converted to XML –Automatic translation from mmCIF data files and dictionaries –3-styles of translation released (Westbrook, Ito, Nakamura, Henrick, Berman (2005) Bioinformatics, 21, )

24 PDBML: canonical XML description of PDB data, developed by the wwPDB. (Westbrook et al. (2005) Bioinformatics, 21, ) ftp://ftp.pdbj.org/ No validation errors for more than 39,000 PDB file description.

25 ATOM N THR A THR A N 1 ATOM 1 A A 1 1 ?. THR THR N N N Full-tag description Separated file for coordinates Examples for the atom coordinates

26 Applications of PDBML Browser at PDBj with the native XML DB Database extension and annotation SOAP (Simple Object Access Protocol) services at PDBj Molecular graphics viewer (jV), which directly parses PDBML, displaying the information written in XML (Kinoshita & Nakamura (2004) Bioinformatics, 20, , Westbrook et al (2005) Bioinformatics, 21, )

27

28

29 Wild-card "*" can be used: *2as, 12*, *

30 "and" and wild-card are available.

31

32

33

34

35 /datablock[struct_confCategory/struct_conf Search proteins, which have one or more helices, which are longer than 10.

36

37 xPSSS new facility Both XQuery and XPath are available.

38 Search for all entries and helix of length with helix of length greater than or equal to 10 residues:

39

40

41 Queries by XQuery and XPath at our new xPSSS

42 Offers interactive molecular visualization with RasMol type commands (source code free). Can be used both as Stand- alone and as applet with Java. Any polygons defined by XML can be displayed and manipulated. Can parse PDBML files and display the molecules. PDBjViewer or jV

43 xPSSS Database System PDBM L PDBMLplus Web server XSLT processor downloader Loader Archive (RCSB- PDB /MSD-EBI /PDBj) Native XML-DB PDBMLplus PDBMLplusF downloa d (FTP) FTP server Internet DDBJ SwisProt/UniProt PIR/GenBank/KEGG/GDB/ ProTherm/EzCatDB EBI/CSA /CATRE S Function/ Source Informatio n Get/Input Tools CATRES Data Annotation Data AddInformation Filtering & Recostructing PDBMLplus PDBMLplusF xPSSS Manual input from literatures PDF files for the primary citations

44 Protein Molecular Surface Database, eF-site (Kinoshita & Nakamura Protein Dynamics Database, ProMode (Wako & Endo) Alignment of Structural Homologues, ASH (Toh) Encyclopedia of Protein Structures, eProtS (Ito & Nakamura Sequence Navigator & Structure Navigator (Standley Development of Secondary Databases: data mining

45 Superfolds of Proteins

46 Identify Protein Function from Sequence similarity Fold similarity

47 Compare Protein folds/architectures

48 DNA-binding domain of Bovine Papilloma Virus-1 E2 2BOP RNA-binding domain of the U1A spliceosomal protein 1URN Distance Matrix

49 Map of the entire protein folds Hou et al. (2003) Proc. Natl. Acad. Sci. USA, 100, SCOP domains Dali (distance matrix alignment)

50 1.Rapidly generates structure neighbors for any PDB ID 2.Both an HTTP and SOAP interface are available 3.Alignments are displayed in a readable format 4.3D coordinates can be viewed or downloaded Structure Navigator Structural similarity is defined by the Number of Equivalent Residues (NER 1 ) 1 Standley D, Toh H, Nakamura H. Proteins 2004;57: Standley, D.M. et al. (2005)

51 Structure Navigator Real-Time Server Blue: Query Red: Temp1 Standley, D.M. et al. (2005)

52 Real-Time Query Strategy Query Clustered Templates (5,097 representatives) Nearest Cluster Standley, D.M. (2006)

53 Structure Navigator-RT could reply the query quickly, but... When we open the Web service of Structure Navigator-RT and accept many requests, our computer resource may not be enough. We want to use large computing resource somewhere else. We need a simple and flexible tool for this purpose. Grid Web Service

54 Grid Web Service using Opal SDSC Technical Report TR , Opal is a toolkit for wrapping application programs as Web services, providing scheduling, Grid security, and data management.

55 CCGrid 2006 Grid Web Service using Opal

56 Operation Providers for WSRF Operation Providers for Notification GetRPProvider SetRPProvider QueryRPProvider SubscribeProvider NotificationConsumer Provider Globus Toolkit 4 (GT4) Application Service (WSRF) Opal OP Opal Opal OP Toolkit Opal Operation Provider by W.W. Li & K. Ichikawa (under the PRIUS project) Opal is deployed as one of Operation Providers of Globus Toolkit 4 (GT4): Opal Operation Provider (Opal-OP)

57 Computing Server (PC-cluster 16 nodes) Head Node (1 PC node): Tomcat Opal Operation Provider Service Pre-WS GRAM GT4 OpenPBS (launch pbs_server, pbs_sched, pbs_mon) Run the Application program Job scheduling 15 Worker nodes: openPBS (launch pbs_mon) Run the Application program Portal Server Apache Tomcat (jsp -Input page/servlet -Opal-OP client program) Library of GT4 Library of Operation Provider Library of Operation Provider Service End user System Osaka Univ. Osaka Univ. (Toyonaka) Submit Job Request Query Launch

58 Query: 3D fold (1crn) Structure Navigator-RT with Opal-OP on GRID

59

60

61

62

63

64

65 Blue: Query Structure Red: Second nearest cluster to the Query Query structure

66 Conclusion We implemented the Opal Operation Provider to run Structure Navigator-RT as a Grid Web Service, using the local scheduler, openPBS, to run our application program on the PC cluster at the Cyber-Media Center, Osaka University, through the network inside Osaka University.

67 Surface similarity Identify Protein Function from Sequence similarity Fold similarity

68 Kinoshita, Furui, Nakamura (2002) J. Struct. Funct. Genomics 2, Kinoshita, Nakamura (2004) Bioinformatics 20, eF-site database for Ligand recognition sites /eF-site/

69 Folylpolyglutamate synthetase query: MJ0226 free form search result against eF-site/ActiveSite + eF-site/mono- nucleotide (1684 entries) Pyruvate kinase Similarity was found here. Function identification of hypothetical proteins Kinoshita & Nakamura (2003) Protein Science 12, Kinoshita & Nakamura (2005) Protein Science,

70 eF-seek: Query service for search of similar molecular surfaces ( -version) ef-site.hgc.jp /eF-seek/ Kinoshita, K. & Nakamura, H. (2006)

71 Application of Grid to Web Service of PDBj Query for the analog structural data: Protein folds and molecular surfaces Blue: Query Red: Temp1 New Structure eF-site Database Quick response by using Grid computing Looking for the similar folds Structure Navigator Real-Time server Looking for the similar surfaces (eF-seek) Fold Database Query for molecular surface Query for fold Result

72 Portal/ Client Program Service A Service C Service B User GT XMLDB Communicate using SOAP Distributed Computing and DB System Personal user site-Asite-Bsite-C GRID computing services with Opal-OP Database query services with XQuery/XPath Integration of computing systems and Database query systems at PDBj Analog Query Search Text Query Search

73 Reiko Yamashita, Daron M. Standley (BIRD-JST / Inst. Protein Res., Osaka Univ.) Kohei Ichikawa, Susumu Date, Shinji Shimojo (Cyber Media Center, Osaka Univ.) Wilfred W. Li, Peter Arzberger (UCSD) Hiroyuki Toh (Medical Inst. Bioregulation, Kyushu Univ) Kengo Kinoshita (Inst. Med. Science, Univ. Tokyo) Other PDBj members (BIRD-JST / Inst. Protein Res., Osaka Univ.) PRIUS Acknowledgements

74


Download ppt "Haruki Nakamura Institute for Protein Research, Lab. Protein Informatics, Osaka University PRAGMA 11 Workshop, Osaka Univ., on October."

Similar presentations


Ads by Google