Presentation on theme: "E-Science Technologies in the Simulation of Complex Materials L. Blanshard, R. Tyer, K. Kleese S. A. French, D. S. Coombes, C. R. A. Catlow B. Butchart,"— Presentation transcript:
e-Science Technologies in the Simulation of Complex Materials L. Blanshard, R. Tyer, K. Kleese S. A. French, D. S. Coombes, C. R. A. Catlow B. Butchart, W. Emmerich – CS H. Nowell, S. L. Price – Chem eMaterials
Combinatorial Computational Catalysis Polymorphism prediction of polymorphs – a drug substance may exist as two or more crystalline phases in which the molecules are packed differently. explore which sites are involved in catalysis – used in diverse industries including petroleum, chemical, polymers, agrochemicals, and environmental.
Polymorph Prediction Different crystal structures of a molecule are called polymorphs. Polymorphs may have considerably different properties (e.g. bioavailability, solubility, morphology) Polymorph prediction is of great importance to the pharmaceutical industry where the discovery of a new polymorph during production or storage of a drug may be disastrous Drug molecules are often flexible and this makes the polymorph prediction process more challenging…
MOLPAK Generation of ~6000 densely packed crystal structures using rigid molecular probe DMAREL Lattice energy optimisation For flexible molecules: conformational optimisation n feasible rigid molecular probes representing energetically plausible conformers Data : Unit cell volume, density, lattice energy Restricted number of structures selected crystal structures and properties stored in Database Morphology n times n = number of conformers Polymorph Prediction Workflow
Blind Test 2004 The Challenge: Predict the crystal structure of 2-methyl-4,5-dinitro-phenyl-acetamide Wide range of conformers within plausible energy range 8 conformers chosen and used in subsequent searches Flexibility indicated with arrows Potential energy surface scan about the CCNC torsion angle
Volume / Z (Å 3 molecule -1 ) Conformer: Blind Test 2004 Minima in the Lattice Energy for Different Conformations Lattice energy + intramolecular energy / kJmol -1
Blind Test 2004 Volume / Z (Å 3 molecule -1 ) Conformer: Best 10kJmol -1 Necessary to consider properties of best crystal structures, such as growth rates, to decide which are more likely to be observed Lattice energy + intramolecular energy / kJmol -1 Minima in the Lattice Energy for Different Conformations
Results Observed crystal structure (revealed upon completion of blind test) – higher energy conformer than those considered! Observed Predicted When just the observed conformer is used as the rigid probe in the search the observed structure is found as global minimum in lattice energy
Summary High energy gas phase conformers may be stabilised by packing within a lattice in the solid state As many conformers as possible need to be considered to maximise the chance of predicting crystal structures correctly and exploring the range of structures that are energetically feasible as polymorphs A fast, distributed e-Science application is being developed, to enable routine crystal structure prediction for large numbers of conformers –this is essential to develop computational methods of predicting possible polymorphs of pharmaceutical molecules
Predicting Morphologies The shape, or morphology, of a crystal plays an important role in the manufacturing process as there are considerable problems if the morphology changes due to impurities or changes of solvent or when the process is scaled up for high volume manufacture. An understanding of the factors influencing crystal morphology will help us to understand how the crystallisation process can be controlled through, for example the use of solvents or additives. BFDH Theory – based on geometrical factors AE Model – based on energetic factors
Scheme for Morphology Calculations Minimised Structure Choose faces to study ~15-20 For each face calculate AE Draw morphology for each crystals set of faces Calculate relative volume growth rates From DMAREL minimised structure BFDH calculation in GDIS Calculate valid shifts Converge regions (exclude polar) Wulff plot New property
The calculated morphology can be visualised using a Wulff plot-where the ratio of surface normal distances of all planes from the centre of the crystal are determined by either the interplanar spacings, attachment or surface energies. Observed and predicted morphology of form 1 of paracetamol Morphologies
Growth Volume New property growth volume- obtained by numerical integration to find the volume within the Wulff shape-gives an indication of whether one face dominates. Form 1 Z=4 Many low energy structures, new observed form 2 predicted to grow fast Pyridine Prompted expt. search for more polymorphs
simulations take too long to run data are distributed across many sites and systems no catalogue system output in legacy text files, different for each program few tools to access, manage and transfer data workflow management is manual licensing within distributed environment e-Science Issues to Address
1.Expose Fortran binary as distributed Web Service Fortran binary XML XSL FO XML Fortran input Fortran output WSDL Define an XML interface to the computation (Web Service Description Language) To get binary to talk in XML: either change Fortran code so input and output uses XML or use parsers and XSLT conversion documents to map from fixed format input/output files to and from XML. Fortran Web Services
2. Orchestrate Web Services with workflow service BPEL script WS wrapped Fortran binary Business Process Execution Language Workflow service is exposed to outside world as a web service Distributed Workflow
CH 4 Fortran programs, use lots of different formats to represent the same thing. Data Representation
CML CML Since we provide new WSDL interfaces for each application we have a perfect opportunity to employ a standard representation for chemical structures. XML standard in Chemistry is CML (Chemical Markup Language) Data Representation Development of chemical markup language (CML) as a system for handling complex chemical content. P. Murray-Rust, New Journal of Chemistry, 2001, 25, 618-634.
(BPEL) workflow Integration with Existing Infrastructure Prototype has been successfully deployed.
Sun Grid Engine (BPEL) workflow Existing grid infrastructure does not integrate easily with web services. Policy on compute clusters enforced by Sun Grid Engine batch system Other users of clusters submit jobs via this control software Building a WSDL binding over the Sun Grid Engine protocol is difficult Smooth transition from existing infrastructure to WS riskier than thought. Integration with Existing Infrastructure
file storage at CCLRC distributed file access via Storage Resource Broker (SDSC) catalogue of files using metadata in relational database web interface to metadata and files via Data Portal metadata editor through browser Data Management at CCLRC
Store data files from simulations in the Storage Resource Broker Storage Resource Broker
Search for studies in material sciences and download associated data using the - CCLRC Data Portal Data Portal
upload files as part of workflow to SRB generate metadata upload extracted data from files Ongoing and Future Work
Acid Sites in Zeolites Determine the extra framework cation position within the zeolite framework. Explore which proton sites are involved in catalysis and then characterise the active sites. To produce a database with structural models and associated vibrational modes for Si/Al ratios. Improve understanding of the role of the Si/Al ratio in zeolite chemistry.
A combined MC and EM approach has been developed to model zeolitic materials with low and medium Si/Al ratios. Firstly Al is inserted into a siliceous unit cell and then a charge compensating cation. The zeolite Mordenite, which has a 1 dimensional channel system, has been studied with a simulation cell containing two unit cells, which means 296 atoms, with 96 Si centres (referred to as T sites). MC/EM
0 It can be seen that there are two distinct regions, -12079eV to -12076eV and -12075eV to -12073eV, but there is no obvious correlation between total energy and cell volume. 100 100 Configurations
However, when 10,000 structures are considered it is clear that the most stable structures correspond to cation placements that do not cause the cell to expand. This requires that the cations sit in the large channel. 0 10000 10000 Configurations
When confirmed the lowest energy positions of Al the cation is exchanged for a proton and again energy minimised. This method will allow us to construct realistic models of low and medium Si/Al zeolites. Such structures can be used for further simulations and aid the interpretation of experimental data. What Next
Extensive use of Condor pools (UCL – 950 nodes in teaching pools). 48 cpu-years of previously unused compute resource have been utilised in this study. Close collaboration with the NERC e-minerals project has allowed access to this resource. 50,000 calculations have been performed each with 488 particles per simulation box, which means a total of 24,000,000 particles have been included in our simulations to date. Condor
1. First use of CML schema for defining Web Service port types. 2. Calculation of 50,000 configurations of zeolite Mordenite (24,000,000 particles) to gain insight into structure when a realistic ratio of Al substitution is included in model. 3. Successfully exposed Fortran codes as OGSI Web Services - prototype application deployed on 80 nodes. The prototype computational polymorph application is being ported to a larger production machine. 4. First use of BPEL standard for orchestrating web services in a Grid application. 5. Open Source BPEL implementation in development enabling late binding and dynamic deployment of large computational processes. 6. Integration of OGSI and BPEL with Sun Grid Engine. 7. Development of Graphic User Interface for polymorph application - connects to relational database via EJB interface. 8. Infrastructure for metadata and data management 9. SRB and dataportal are already being used to hold datasets and being used for transferring the data between different scientists and computer applications. 10. Implementation of Condor pool at Ri. Achievements To Date
We are now doing science that was not possible before the advancements made within e-Science. Key Achievement