Grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen.

1 grant: 01IG09006 The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen 28.04.2010

2 Outline Motivation MoSGrid (Molecular Simulation Grid) The MoSGrid portal Domain specific workflows MSML (Molecular Simulation Markup Language) Future work MoSGrid Portal 2

3 Motivation Numerous applications for molecular simulations and docking, e.g. Materials science Structural biology Drug design Sophisticated tools and algorithms support scientists High-performance computing facilities are available MoSGrid Portal 3

4 Motivation Drawbacks of using molecular simulations and docking Usability of tools is limited Complexity of methods Lack of graphical user interfaces Complexity of infrastructures Many end users lack computer science background ⇒ Need for self-explanatory and intuitive user interfaces ⇒ A portal for molecular simulations and docking MoSGrid Portal 4

5 Portals Single point of entry Possibility to customize views and tools Store user preferences No installation of software on the user’s side No firewall issues MoSGrid Portal 5

6 Unifying Diversity 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Slide copied from: Stuart Owen „Workflows with Taverna“ MoSGrid Portal 6

7 MoSGrid Molecular Simulation Grid (D-Grid project) Goal Providing users with Grid access to molecular simulation tools and docking tools via a workflow- enabled portal Implementation of high-performance computing Workflows Annotations of results Data mining Use of the D-Grid-infrastructure MoSGrid Portal 7

8 MoSGrid Partners Universität zu Köln Eberhard-Karls-Universität Tübingen Universität Paderborn Konrad-Zuse-Zentrum für Informationstechnik Berlin Technische Universität Dresden Technische Universität Dortmund Bayer Technology Services GmbH, Leverkusen Origines GmbH, Martinsried GETLIG&TAR, Falkensee BioSolveIT, Sankt Augustin COSMOlogic GmbH&Co. KG, Leverkusen MoSGrid Portal 8

9 MoSGrid in a Nutshell XtreemFS Cloud File System Portal WS-PGRADE Grid resources UNICORE 6 Result RecipeStructureResult High-level middleware service level gUSE Workflow MoSGrid Portal 9

10 Credential Management User management based on Liferay features  Community management  Organization management X.509 user certificates SAML (Security Assertion Markup Language)  Minimize credential data transfers  Set of maximum hops for trust delegation  Usable for single sign-on infrastructures (e.g., Shibboleth) MoSGrid Portal 10

11 Credential Management MoSGrid Portal 11

12 WS-PGRADE MoSGrid Portal 12

13 WS-PGRADE MoSGrid Portal 13

14 WS-PGRADE MoSGrid Portal 14

15 gUSE Architecture User interface WS-PGRADE Grid resources middleware layer UNICORE 6 High-level middleware service layer gUSE grid User Support Environment MoSGrid Portal 15

16 gUSE Submitter Interface GridService actionJobSubmit actionJobAbort actionJobOutput actionJobStatus actionJobResource JOBn JOB1 JOB2 JOB3JOB4 GridService MoSGrid Portal 16

17 gUSE Submitter for UNICORE JOBn JOB1 JOB2 JOB3JOB4 Uspace gUSEUNICORE 6Resources 4 - Upload data 1 - Security 2 - Registry 3 - Submit job 5 - Start job actionJobSubmit MoSGrid Portal 17

18 ASM (Application Specific Module) Library for managing WS-PGRADE workflows Listing of users and workflows in the local repository Import of Workflows in the user space Upload/download of input and output files Setting the parameters of a job in a workflow Submission of workflows Monitoring of workflows Deletion of workflows Usable in portlets und Java tools ⇒ Implicit use of gUSE submitter 18 MoSGrid Portal

19 XtreemFS is an object-based grid and cloud filesystem Ability to minimize data transfer Low latency, local availability through replication Grid Security Infrastructure (GSI) support Distributed Data Management MoSGrid Portal 19

20 XtreemFS integration Portlet UNICORE GSI support Data flow WS-PGRADE XtreemFS Frontend nodes Compute nodes UNICORE mediates data transfers XtreemFS UNICORE TSI Distributed Data Management MoSGrid Portal 20

21 Domain Molecular Dynamics Study and simulation of molecular motion Provide a molecular dynamics service on multiple levels Direct upload of job descriptions Workflows and standard recipes for repeating tasks Analysis of relevant properties MoSGrid Portal 21

22 Equilibration of Proteins Proteins from databases (e.g., the Protein Data Bank, PDB) do not necessarily represent a near- native conformation/configuration For all kind of production runs a minimization and an equilibration is an indispensable prerequisite Eases the work of experienced users Lowers the hurdle for novice users MoSGrid Portal 22

23 MoSGrid Portal UseCase: Gromacs_EQ structure (pdb/gro) topology (top/itp) EM.mdp (mdp) pdb2gmx structure (pdb) editconf box (pdb) genbox Solvated (pdb) grompp adj. Top. (top/itp) topol.tprmdout.mdp 23

24 MoSGrid Portal mdrun ener.edr traj.trrtraj.xtc md.log state.cpt SYSTEM_EM.pdb grompp mdrun topol.tprmdout.mdp ener.edrtraj.trrtraj.xtc md.logstate.cpt SYSTEM_EQ. pdb FULL.mdp (mdp) g_energyxmgrace Analysis.jpg g_energyxmgrace Analysis.jpg 24

25 MD Portlet MoSGrid Portal 25

26 Domain Quantum Chemistry Study and simulation of molecular electronic behavior relative to their chemical reactivity Survey - MoSGrid Community First implementation for Gaussian Then support for Turbomole GAMESS-US Further relevant QC applications MoSGrid Portal 26

27 Domain Quantum Chemistry Gaussian Jobs Single input file Defines molecular geometry and task Result Not structured output Platform dependent checkpoint file Integrated multi-step job option Not usable for generalized workflows MoSGrid Portal 27

28 Domain Quantum Chemistry First prototype Workflow controlled by portlet Three phases Pre-processing Job execution Post-processing MoSGrid Portal 28

29 Domain Quantum Chemistry Assisted job creation Guiding GUI Most common options available Pre-created job description Upload of Gaussian job description file Monitoring of jobs Post-processing and presentation of results Workflows MoSGrid Portal 29

30 Domain Quantum Chemistry Preprocessing Portlet (GUI) supports common options Automatic generation of job description Submission of job MoSGrid Portal 30

31 Domain Quantum Chemistry Post-processing Parsing of result file Python scripts executed by portlet Relevant information about molecular properties Data in CSV-Format saved and accessible MoSGrid Portal 31

32 Domain Docking CADDSuite (Computer-aided Drug Design) MoSGrid Portal 32

33 Galaxy available for local ressources in Tübingen Domain Docking MoSGrid Portal 33

34 MolDB Stores molecules in binary format, which allows for fast export Automatically creates and stores can. smiles, fingerprints, and functional groups counts for imported molecules Automatically saves and restores docking-/rescoring- results DB can be filtered to all stored molecule properties before exporting molecules Current speed for import/export: ~100 compounds/sec. MoSGrid Portal 34

35 MSML Molecular Simulation Markup Language Based on CML (Chemical Markup Language) Common interpretation by humans and computers Follows the minimum information principle Description: XSL transformation Used for validation purposes 35 MoSGrid Portal

36 Future Work WS-PGRADE Integration of the UNICORE IDB to offer drop-down boxes of available tools MD- and QC-Portlet Adoption to gUSE workflow engine via the ASM libraries CADDSuite Export of workflows from Galaxy to WS-PGRADE MSML Further development MoSGrid Portal 36

37 Involved Projects SHIWA (SHaring Interoperable Workflows for Large Scale Scientific Simulations on Available DCIs) EU project Duration: 01.07.2010 – 30.06.2012 Tübingen participates via Galaxy workflow export CompChem Virtual Organization EGEE project Available ressources 37 MoSGrid Portal

38 Future Projects SCI-BUS (SCIentific gateway Based User Support) EU project Duration: 01.10.2011 – 30.09.2014 Pan-European ressources Tübingen participates with the extension of the MoSGrid portal with an interactive molecule editor and a semantic search 38 MoSGrid Portal

39 Acknowledgements 39 Oliver Kohlbacher Ákos Balaskó Georg Birkenheuer Sebastian Breuers Richard Grunzke Sonja Herres-Pawlis Valentina Huber Miklos Kozlovszky Jens Krüger István Márton Patrick Schäfer Bernd Schuller Johannes Schuster Anna Szikszay Fabri Klaus-Dieter Warzecha Martin Wewior MoSGrid Portal

40 40 MoSGrid Portal

