Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-III INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Ab initio electronic structure calculations.

Similar presentations


Presentation on theme: "EGEE-III INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Ab initio electronic structure calculations."— Presentation transcript:

1 EGEE-III INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Ab initio electronic structure calculations on the Grid M. Sterzel – ACC “Cyfronet” AGH COST Training School on Molecular and Material Science Grid Applications Trieste, 30 March 2010

2 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 1 Outlook Aim of this talk –To demonstrate benefits of Grid computing in chemistry Parts of the talk topics –Work on the Grid vs. cluster – comparison –Chemical Software packages available on the Grid  Parallel execution –Typical Chemical Computations on the Grid  Geometry optimizations  Numerical Frequencies  Chemical reactions –In silico Lab –Final Remarks

3 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 What is a/the Grid ? A Grid is NOT –The Next generation Internet –A new Operating System –Just  (a) a way to exploit unused cycles  (b) a new model of parallel computing  (c) a new model of P2P networking Definition (Vaidy Sunderam) –A paradigm/infrastructure that enables the sharing, selection, & aggregation of geographically distributed resources (computers, software, data(bases), people) [share (virtualized) resources] –... depending on availability, capability, cost –... for solving large-scale problems/applications –... within virtual organizations [multiple administrative domains] Grid Checklist (Ian Foster): A Grid is a system that –Coordinates resources that are not subject to centralized control –Uses standard, open, general purpose protocols and interfaces –Delivers nontrivial qualities of service 2

4 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 3 Authorization On a cluster - password okrucyusz:~ sterzel$ ssh ymsterze@ui Last login: Wed Mar 5 18:40:24 2008 from ha2rtr.agh.edu.pl [ymsterze@ui ymsterze]$ On the grid - certificate (Grid identity card) –User has to be a member of Virtual Organization (at least one) –Virtual Organization determines the resources user can use –To access grid resources user needs to obtain proxy: [ymsterze@ui ymsterze]$ voms-proxy-init -voms gaussian Your identity: /C=PL/O=GRID/O=Cyfronet/CN=Mariusz Sterzel Enter GRID pass phrase: Creating temporary proxy.............................. Done Contacting voms.cyf-kr.edu.pl:15001[/C=PL/O=GRID/O=Cyfronet/ CN=voms.cyf-kr.edu.pl] "gaussian"...................Done Creating proxy........................................ Done Your proxy is valid until Thu Mar 6 07:20:50 2008

5 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 EGEE 4Mariusz Sterzel CGW'08 Kraków, 13 October 2008 4 EGEE Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … >250 sites 48 countries >150,000 CPUs >50 PetaBytes >15,000 users >150 VOs >150,000 jobs/day

6 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 5 Job file Grid JDL file Executable = "/bin/bash"; Arguments = $VO_GAUSSIAN_SW_DIR/g03/gaussian.x water.com"; JobType = "MPICH"; NodeNumber = 4; StdOutput = "water.out"; StdError = "water.err"; InputSandbox = {"water.com"}; OutputSandbox= {"water.out", "water.log", "water.err" }; PBS file #!/bin/bash #PBS -l ncpus=4 #PBS -q long #PBS -o water.out #PBS -e water.err #PBS -N my_job_name #PBS -M my@email #PBS -m e export g03root=/somewhere. $g03root/g03/bsd/g03.profile $g03root/g03 water.com

7 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 6 Job management PBS – qsub – submit job to a queue – qdel – delete job from a queue – qstat – show job status in a queue EGEE Grid – glite-wms-job-submit – submit job to the Grid – glite-wms-job-delete – remove job from the Grid – glite-wms-job-status – show status of the job on the Grid – glite-wms-job-output – retrieve job files from the Grid … just a few new commands…

8 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 7 Freely available packages on EGEE Grid: Commercial packages on EGEE Grid: –Gaussian –Turbomole –Wien2k Chemical software –GAMESS –DALTON –CPMD –Newton X –DL_POLY –NAMD –RWAVEP –GROMACS –Autodock –Tinker –Solvate –PIC-DMSC –MCGBgrid –QMC –ABCtraj –VENUS –CRBS –LM –COLUMBUS –DINX –Abinit

9 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 8 Gaussian VO Why Gausian? –Large number of computational methods implemented –One of the first ab initio codes –The most popular among communities –User friendly –Available for many platforms along with GUI Gaussian VO –Invented and operated by ACC CYFRONET –All license issues confirmed with Gaussian Inc, –Open for every EGEE user –Any computing centre with site Gaussian license may support it (4 supporting centres, another 3 in the line) –30+ users since the start in September 2006 –VO manager – Mariusz Sterzel ( m.sterzel@cyfronet.pl ) –Enabled for parallel execution up to 8 processors

10 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 9 Turbomole – an alternative Advantages: –Probably the fastest B3LYP implementation –Analytical gradients for excited states at DFT and CC2 levels –Variety of fitting approaches speeding up calculations –Very well scalability during parallel execution –Extremely fast and very well parallelised (ri)CC2 and (ri)MP2 Disadvantages: –Limited number of DFT functionals (only “good” ones available) –Lack of parallel version of analytical second derivatives –Lack of parallel version of TDDFT –Only NMR chemical shifts implemented, no spin-spin couplings

11 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 10 Gaussian VO – participation As a user: Register at: https://voms.cyf-kr.edu.pl:8443/voms/gaussian Wait for VOMS admin acceptance voms-proxy-init --voms gaussian and you are ready to use the program... As a participating centre: Just sent an e-mail concerning participation to VO manager After confirmation of the license status at your centre with Gaussian Inc, detailed information concerning set-up will be sent back to you More details at: http://egee.grid.cyfronet.pl/Applications/gaussian-vo/

12 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 11 Grid and parallel execution Past: –Serial jobs only –Job of MPICH type always enforced execution of mpirun Present – mpirun no longer enforced –Instead a wrapper script mpistart can be executed which will automatically set up environment for required MPI flavour –No possibility to request desired # of processors on a WN … Unfortunately not all sites are set up… Current work –Support for OpenMP jobs

13 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 12 Chemical software “Old codes” – mostly written in FORTRAN Serial – parallel execution added later (with exceptions) Different parallelization models used Low scalability in many cases Only selected computational methods parallelized … all that makes parallel grid ports of chemical software even more complicated

14 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 13 Selected cases Gaussian –Parallelization via OpenMP or Linda –OpenMP – SMP machines or multiprocessor/core clusters up to # of processors/cores on WN –Linda – allows the parallel execution between nodes. Requires equal # of processors for each WN –For Linda additional expenses required (commercial package), available only for specific platforms Turbomole – Uses MPI – currently HPMPI – No specific requirements GAMESS – Uses sockets but MPI execution possible (a little slower but more convenient for execution on a grid) ADF – Uses MPI (MPICH, OpenMPI, HPMPI, …) – One of the best parallelized QC codes // to my knowledge ;-)

15 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 14 Grid solution Gaussian –Parallel execution via OpenMP on a single WN up to # of processors/cores available on that worker node –Necessarily queue system set-up requires a Site admin help –Torque set-up:  Modification of /var/spool/pbs/torque.cfg to: SUBMITFILTER /var/spool/pbs/submit_filter.pl –Other settings -- typical  Job has to be of MPICH type  # of processors controlled via NodeNumber variable  Gaussian %Nproc route is automatically set-up by script executing Gaussian –Execution with 8 processors per job possible.

16 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 15 Sample script Executable = “/bin/sh”; Arguments = “$GAUSSIAN_SW_DIR/gaussian.run myfile.com”; JobType = “MPICH”; NodeNumber = 8; InputSandbox = {“myfile.com”}; StdOut = “myfile.out”; StdErr = “myfile.err”; OutputSandbox = {“myfile.log”, “myfile.chk”, “myfile.out”, “myfile.err”}; Requirements = other.GlueCEUniqueID==“ce.cyf-kr.edu.pl”

17 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 16 Grid Solution cont. Turbomole –No special set up except shared directory needed, # of processors automatically discovered by Turbomole scripts NAMD –Similar to Turbomole. If the NAMD executing script was set-up properly during installation the necessarily “node file” is created every time program is executed GAMESS –Depends on Grid port –In case of MPI no additional input needed –DDI case – may require WN reconfiguration especially if large DDI memory is requested by a job

18 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 PL-Grid – Polish Grid Infradtructure In addition to above: –CPMD –Cfour –ACES III –Amber –OOMMF –ADF –Amber –Cadence –Fluent –... any aplication needed by users... 17

19 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 18 Execution benchmarks Scheduling time –MPI jobs  4 proc./job -- usually less than hour  8 proc./job -- waiting time even 3-4 hour –OpenMP jobs  Job waiting time much longer, heavily depends on site overload 4 proc./job -- from less than hour up to 6 hours 8 proc./job -- in some cases job waiting time exceeds 12 hours Parallel job execution can be inefficient in case of short (less than 24 h) jobs

20 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 19 Grid application in chemistry Tasks to which Grid can be applied directly: –Conformational analysis –Numerical frequency computations –Zero Point Vibrational Averaging –Property computations for series of geometries from Molecular Dynamics simulation –Determination of chemical reaction paths –Determination of potential energy surfaces (PES) … all kind of “brute force” tasks, or tasks which operate on huge data sets Other tasks –Computations need to to be planned in order to maximize benefits from the grid computing

21 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 20 An example Geometry optimization: –Steep potential  Few steps needed –Flat potential  Many steps needed  Energy and gradient convergence have to be increased to high values Start Stop

22 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 21 PCP complex

23 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 22 PCP cont.

24 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 23 PCP cont.

25 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 24 PCP cont.

26 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 25 PCP cont.

27 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 26 PCP cont.

28 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 27 PCP cont.

29 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 28 Plan Computations of the whole molecule are not possible System needs to be modeled –For this we need:  Chlorophyll A and peridinin ground and excited states geometries and normal modes  The geometry of whole complex  A little programming (tetramer model, energy transfer model)

30 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 29 Plan Computations of the whole molecule are not possible System needs to be modeled –For this we need:  Chlorophyll A and peridinin ground and excited states geometries and normal modes  The geometry of whole complex  A little programming (tetramer model, energy transfer model) … a lot of luck

31 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 30 First step – Peridinin

32 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 31 First step – Peridinin

33 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 32 Peridinin geometry A long peridinin chain makes usual gradient based optimization inefficient Instead we propose following scheme: –MD for peridinin (force field level) –Geometry preoptimization for series of MD snapshots (semi empirical or ~ RHF/STO-3G level) –“Final” geometry optimization for few lowest energy MD snapshots (CASSCF/PT2 level) –Verification of the minima by frequency calculations –Excited state geometry optimization with ground state as a starting point –Again, minima verification via vibrational analysis –Verification of obtained data by comparison with experiment

34 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 33 H—CN CN—H Chemical reactions... a process that results interconversion of molecules

35 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 34 Chemical reactions Points of interest: –Structure of substrates and products –Structure of active complex at TS –Reaction path

36 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 35 Chemical reactions Points of interest: –Structure of substrates and products –Structure of active complex at TS –Reaction path(s) Reaction path substrate(s) product(s) Transition State (TS) Energy

37 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 36 Chemical reactions Energy substrate(s) product(s) A+B C D Intermediate product TS1 TS2 E(TS1) > E(TS2) Reaction path

38 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 37 Chemical reactions Energy Possible reaction paths; More intermediate products Reaction path

39 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 38 Chemical reactions Main problem – TS determination

40 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 39 Chemical reactions Main problem – TS determination Artur Michalak

41 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 40 Chemical reactions Main problem – TS determination

42 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 41 Chemical reactions Main problem – TS determination E ‘reaction coordinate’

43 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 42 Chemical reactions Main problem – TS determination ‘reaction coordinate’ E

44 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 43 Chemical reactions TS verification – vibrational analysis – one imaginary frequency! i1203 cm -1

45 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 44 Sample calculations N 2 O braking on oxide surfaces – possible mechanisms Electron transfer 6. (O - )--X + --(O - ) X + O 2 1. N 2 O + X 2. N 2 O (s) + X  X + --N 2 O - 4. X + --N 2 O -  X + --(O - ) + N 2 5. (O - )--X + --(O - ) e - transfer Filip Zasada

46 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 45 Sample calculations N 2 O braking on oxide surfaces – possible mechanisms An Oxygen transfer 1. N 2 O (g)  N 2 O (s) 2. N 2 O (s) + X  X--O--N 2 4. O-X-O 5. O--O-X6. X + O 2 4. X-O

47 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 46 Sample calculations Vacuum Oxide slab Adsorbed N 2 O Slab construction Oxide slab Matrix generation

48 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 47 Sample calculations Reaction paths:

49 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 48 Sample calculations Reaction paths:

50 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 49 Sample calculations Computational details: –Gaussian 03 D.01 –BP86 functional –Basis set of double-ζ quality Timings: –CPU time for SP calculation – approx 7 hours –15 paths – 10-50 energy points on each path –In total about 250 energy points calculated

51 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 50 Conformational searchers To determine lowest energy structure (1R,2R)-1,2-bis-(N’-cyklohexy-1’,4’,5’,8’- naphthalenetetracarboxydiimide)cyklohexane Possible issues –For short time jobs it is convenient to group several geometries in to one job to minimize average job scheduling time

52 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 51 Conformational searches

53 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 52 Harmonic frequencies Numerical frequency computations for lycopene –96 atoms, 2·3·96+1=577 independent computation steps –Software: GAMESS version June 2005 –VOCE VO resources used –Methodology: B3LYP, cc-pVDZ basis set –Computations done on approx. 100 processors (ia64 and i386) –CPU time for single computation:  Intel Xeon 2.8Gh - 14h 39’  Intel Itanium 2 1.3Gh - 12h 8’ –Total time:  single CPU - 330 days (estimated)  EGEE Grid - 3 days

54 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 53 In silico Lab Computational chemists use first principles chemistry packages available on the Grid (Gaussian, GAMESS, TURBOMOLE) Grid is still mainly available through command line interfaces (voms-proxy- init, glite-wms-job-submit, glite-wms- job-status) Adoption to the Grid environment by non-experts is difficult Lets build a web-based problem solving environment facilitating the use of Grid … but not only that

55 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 State-of-the-art WebMO –Desktop and Web access –Supports main computational chemistry applications –Does not support Grid or local queues infrastructures ECCE – Extensible Computational Chemistry Environment –Only desktop access –Supports main computational chemistry applications –Supports many queue management systems (PBS, LSF, NQE, etc.) –Does not use Grid infrastructures CCG – Computational Chemistry Grid –A virtual organization which is a part of Teragrid –GridChem – Java client which can be run as a WebStart application P-GRADE – Framework for Development and Execution of Parallel Applications –A Grid-oriented solution 54

56 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 Description of the solution - requirements Supports main chemistry packages –Gaussian –GAMESS –TURBOMOLE –... Is accessible through a web interface –All Grid-related operations should be embedded (proxy generation, job submission and status monitoring, LFC catalog operation) –Persists information about executed jobs between web sessions Enables user-centric processing rather than grid-centric –It should not be yet another Grid job submission tool –Supports inter-application geometry passing –Provides automated report summaries –Covers Grid complexity Supports annotations through user free-text tagging 55

57 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 Description of the solution – architecture (1/2) Uses gLite for Grid job management –A set of Grid APIs is used (LFC, WMS, VOMS) Job submission and monitoring implemented as a separate layer for better error handling –GridSpace platform used –Each application backed by a separate script Application model realized by using SINT (Semantic Integration Tool) –Basic concepts such as geometry, basic and detailed reports, input parameters, annotations are modeled Interactive graphical user interfaces are used –GWT (Google Web Toolkit) is used as the user front-end technology 56

58 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 57 Description of the solution – architecture (2/2)

59 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 58 Demo Switching to demo...

60 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 59 References T. Gubala, M. Bubak, P.M.A. Sloot: Semantic Integration of Collaborative Research Environments, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global Marian Bubak et al.,: Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global

61 Enabling Grids for E-sciencE EGEE-III INFSO-RI-031688 60 Summary Access to the Grid is easy, does not differ to much from queue system usage EGEE Grid offers variety of software packages for chemical computations. A parallel execution can be made as simple for the user as a serial execution is now It is always possible to find solution for parallel execution even if computational platform does not directly support certain parallelization model With a little of planning every computational chemistry task may benefit from the Grid platform


Download ppt "EGEE-III INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Ab initio electronic structure calculations."

Similar presentations


Ads by Google