Presentation is loading. Please wait.

Presentation is loading. Please wait.

Looking for a (standard) Common Format for (Quantum)

Similar presentations

Presentation on theme: "Looking for a (standard) Common Format for (Quantum)"— Presentation transcript:

1 Looking for a (standard) Common Format for (Quantum)
Motivation Vocabolary wrappers Looking for a (standard) Common Format for (Quantum) Computational Chemistry A WG activity within COST action 23 (WG D23/0006/01) Elda Rossi, Andrew Emerson – CINECA Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse José Sanchez-Marin - Universitat de Valencia Peter Szalay - Eötvös Loránd University Rosa Caballol - Universitat Rovira i Virgili Tarragona Looking for a (standard) common format for Computational (Quantum) Chemistry This talk is about the activity recently carried out in the framework of “COST in Chemistry”, EC funded project The final aim is to provide a “workflow” tool to allow researchers to collaborate by exchanging different programs. To this end, the first problem we faced, and that is the core of the presentation, was that of defining a common format for Quantum Chemistry programs. An XML-based format is proposed, designed to describe in a quite general way a Quantum Mechanical system. This format is used for a repository where all data on the system under investigation are maintained. From the repository, data are retrived and converted to the input stream of the specific program to be run. The conversion is done by a wrapper code, specifically designed for each single program. Two possible ways to write the wrappers are discussed, using respectively the Fortran and Python programming language.

2 Motivation for the work
Vocabolary wrappers To build a meta-system for supporting research collaboration in the field of “Localised Orbitals in post-SCF methods … Linear Scaling methods in a Multi-Reference context”

3 The scenario Different platforms
Motivation Vocabolary wrappers Different laboratories need to collaborate Different “home-made” codes need to be used together since they give different views of the same problem General purpose “basic” codes needed to pre-compute data in a sort of pipeline Programmes should remain on their original sites under the responsibility of their authors Different platforms Network connections (grid architecture) Workflow

4 The need of a Common Format
Motivation Vocabolary wrappers The first problem we faced: How different codes (on different platforms) can communicate we need a Common Format for (at least) Quantum Chemistry codes

5 Preliminary steps Looking around …
Motivation Vocabolary wrappers Looking around … CML available since long time XML is use by Accelrys for internal files XML is used by ArgusLab for internal files All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties XML seems the best technology so we took the decision to try another XML based format HDF5 looked nice for storing large binary data typical of QC

6 Motivation Vocabolary wrappers Why XML Pro: Standard Self referencing Extensible Some experience (CML) already exists Con: (Verbose) Very few applications to “science” Seems difficult to be used with Fortran apparently no specific data types for quantum mechanical concepts or data-types (i.e. integrals, wavefunction, orbitals, coefficients, energy levels, etc.)

7 Which codes to be integrated: the meta-system building blocks
Motivation Vocabolary wrappers CAS-DI (Multi-Reference Configuration Interaction) EPCISO (Spin-orbit Configuration Interaction) NEVPT (MR PT, P-Variational approaches) LOCNAT (Localized Multireference algorithm) FCI (Full Configuration Interaction) PROP (Property Calculation) COLUMBUS (General ab-initio electronic package) DALTON (General ab-initio package) MolPRO, MolCAS, Nwchem, Gaussian,…

8 How should work the engine
Motivation Vocabolary wrappers IN-wrapper Leaves the program unchanged One wrapper for each program – If a code is added only one wrapper to be written IN-files Data Repository XML/HDF Program OUT-files OUT-wrapper

9 code3 y u Input tool Data repository x code2 v w code1 Repository
Motivation Vocabolary wrappers Repository Wrappers Workflow engine code3 y u Input tool Data repository x code2 v w wrapper wrapper In files code1 Out files

10 QCML: an XML format for QC
Motivation Vocabolary wrappers QCML: an XML format for QC In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities As a first approximation three domains can be identified Base FACTS initial data for describing the physics of the system DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …) W-FLOW which codes are in the pipeline, specific input Parameters data, … A base fact is a fact that is a given in the world and is remembered (stored) in the system. A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.

11 FACT: molecule <system title date program author>
Motivation Vocabolary wrappers <system title date program author> <molecule nElectrons charge spinMultiplicity spaceSymmetry> <symmetry> groupName/> <geometry type unit numAtoms symmetryRef > <atom symbol isotope x3 y3 z3/> <basis name type numOrbitals > <atomBase angularMomMAX symbol > <angularMom value symbol numOrbitals> <orbital id numPrimitives> <exps/> <coeffs/> Symmetry: group name & other symmetry data Geometry: only cartesian, full or unique for sym Basis: by name or fully defined FACTS DERIVED W-FLOW

12 FACT: molecule/symmetry
Motivation Vocabolary wrappers FACTS DERIVED W-FLOW

13 FACT: molecule/basis Name: vdz Alias: molpro:vdz/molcas:Vdz/g03:V-dz/…
Motivation Vocabolary wrappers Name: vdz Alias: molpro:vdz/molcas:Vdz/g03:V-dz/… Type: spherical definedFor: C+O+H+… atomBase: H (max angularMom, numPrimitives, numAO) angularMom: s (numAO) orbital: 1st (numPrimitives) exponents: coefficients: orbital: 2nd … angularMom: p … atomBase: O atomBase: C This archive could be organised in XML form, or better use already available data banks (EMSL Basis Set Library) FACTS DERIVED W-FLOW

14 DERIVED data: computedData
Motivation Vocabolary wrappers <system …> <computedData> <energy unit levelOfTheory quality value> <state spaceSymmetry spinMultiplicity excitationLevel /> <property unit levelOfTheory quality value> <state “bra” spaceSymmetry spinMultiplicity excitationLevel /> <state “ket” spaceSymmetry spinMultiplicity excitationLevel /> <operator order name/> <file address URL/> A “schema” has been written for QCML FACTS DERIVED W-FLOW

15 DERIVED : computedData/file
Motivation Vocabolary wrappers Problem with large binary datasets include the reference not the actual data Two possible strategies: Leave data in their native format and translate them only when needed. Maintain different version (formats) of the same data Define a “standard” format for binary data and convert them anyway The second was the solution of choice HDF5 appears to be a good solution

16 HDF Mission Motivation Vocabolary wrappers To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery. Format and software for scientific data Stores images, multidimensional arrays, tables, etc. Emphasis on storage and I/O efficiency Free and commercial software support Emphasis on standards Users from many engineering and scientific fields

17 Example HDF5 file “/” (root) “/foo” 3-D array palette Table
Motivation Vocabolary wrappers “/” (root) “/foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 palette Table Raster image Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all. Raster image 2-D array

18 Example HDF5 file “/” (root) “/MO” “/MO” “/AO” “/bi” “/mono” “/mono”
Motivation Vocabolary wrappers “/” (root) “/MO” “/MO” “/AO” “/bi” “/mono” “/mono” “/bi” “/coefficients” Kinetic Overlap Repulsion Kinetic+ Repulsion Property Orb | occ | energy ----|-----|----- 1 | 0 | 0.35 2 | 0.5| 0.26 3 | 2. | 0.69 Table 4-D array This shows that you can mix objects of different types according to your needs. Typically, there will be metadata stored with objects to indicate what type of object they are. Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.

19 } HDF Software General Applications Application Programming
Motivation Vocabolary wrappers General Applications Utilities and applications for manipulating, viewing, and analyzing data. Application Programming Interfaces Low-level Interface } HDF I/O library High-level, object-specific APIs. Low-level API for I/O to files, etc. HDF file File or other data source It is useful to think about HDF software in terms of layers. At the bottom layer is the HDF5 file or other data source. Above that are two layers corresponding the the HDF library. First there is a low level interface that concentrates on basic I/O: opening and closing files, reading and writing bytes, seeking, etc. HDF5 provides a public API at this level so that people can write their own drivers for reading and writing to places other than those already provided with the library. Those that are already provided include UNIX stdio, and MPI-IO. Then comes the high-level, object -specific interface. This is the API that most people who develop HDF5 applications use. This is where you create a dataset or group, read and write datasets and subsets, etc. At the top are applications, or perhaps APIs used by applications. Examples of the latter are the HDF-EOS API that supports NASA’s EOSDIS datatypes, and the DSL API that supports the ASCI data models.

20 HDF file structure for QC
Motivation Vocabolary wrappers Root  AO  <i/j>  <i/T/j>  <i/Vnuc/j>  <i/T/j>+<i/Vnuc/j>  <ij/kl>  MO  <i/T/j>  <i/V/j>  coeff(i,j)  Property  <i/p/j> Norb Name QCML_ref Norb Spin Polar.: a=b a b Orb Classif: Core Active Virtual Orb Energies: Orb Symm: [1-order] + format metadata (integer, binary, Endian-ism, …)

21 workflow parameters Future work (web-services, bottom-up approach (top-down?), Each “code” must be divided into “elementary recipes” (catalog of recipes) The interface of each “recipe” must be described (idl, xml, …) in addition to more dynamical informations A “master of cerimony” must exist with the following tasks: ORB-like “Inter-client” communication Job planning User inteface (grid abstraction) …. FACTS DERIVED W-FLOW

22 QCML processing: wrappers
Motivation Vocabolary wrappers One couple of wrappers for each code in the metasystem They should be written & maintained by the authors of the chemical codes XML processing can be used (DOM) but … what language??? Fortran: no easy and stable DOM available Scripting languages (Perl/Python/Java): not known by chemists We tried both ways (Fortran & Python)

23 Fortran DOM: drawbacks
Motivation Vocabolary wrappers The only problem is the Fortran binding It doesn’t exist (at least last year …) DOM is OO and Fortran is not It exists a C binding (Gdome2) Gdome2 was installed – very hard work – on a mainframe platform (it was conceived for Linux) We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …)

24 Why Fortran GOOD Users don't need to learn a new language
Motivation Vocabolary wrappers GOOD Users don't need to learn a new language Homogeneous environment BAD Tricky: need an external library (f77xml) built on top of gdome2 Porting problems for gdome2/libxml2 may arise

25 F77xml library Still in development
Motivation Vocabolary wrappers Still in development v0.4 is out (experimental, with limited features) v1.0 upcoming, API changed to be nearly DOM2 compliant Written in C on top of gdome2 Designed for interfacing to F77 (also F90 soon) Reduced namespace pollution Cons: F77 syntax is difficult (DOM2 + tricks) F90 syntax is simpler A pre-processor will convert F90 syntax to F77

26 F77xml library - V1.0 example
Motivation Vocabolary wrappers Gdome2 (C) GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc); F90 Call f77xml_el_firstChild(nodeCode, elemCode, exc) First position: Return value NodeCode, elemCode,exc mapped to INTEGER F77 Func='el_firstChild' Call xp3t1(nodeCode,func,elemCode,exc) Multiplexer function: x: p3: 3 parameters (+ name function) t1: type 1 parameter schema (code/code/error)

27 Why Python GOOD Very Easy Object Oriented Language
Motivation Vocabolary wrappers GOOD Very Easy Object Oriented Language Works well with strings Simple ed efficient DOM interface for XML Present in almost all UNIX/LINUX distribution BAD Users do need to learn a new language Maybe less powerful than Perl Usually not used by chemists

28 At the present a prototype does work with molpro-fci chain.
Python Wrapper Motivation Vocabolary wrappers At the present a prototype does work with molpro-fci chain. It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI)

29 Wrappers in the future Motivation Vocabolary wrappers We have to develop a script to write initial XML file Any wrappers should be able to take information from output and append them in XML file User interface could be done with a GUI using TkInter a package integrated in Python

30 Python or not Motivation Vocabolary wrappers Python is very simple to learn and works very efficiently with xml Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade Possibility of a GUI could make our project much more user-friendly

31 What we have done … MolPro Start here FCI Stop here MolPro IN-file
Single platform: IBM SP4 Two code chains MolPro to FCI MolPro to CasDI IN-wrapper MolPro OUT-wrapper FCIDUMP Start here QCML Repository HDF5 Repository IN-wrapper Bin file for FCI FCI IN-file IN-wrapper FCI Stop here

32 In conclusion … Two important hints on data…
Use some XML dialect for describing simple structured data Use HDF5 for storing large array and binary data Need of a good and easy API to XML & HDF How to manage the workflow How to manage the grid connection

33 XML processor A set of rules and interfaces to interact with XML data using a user- program There are two main API specifications done by the w3c consortium: DOM: Document Object Model SAX: Simple API for XML to do something useful with XML, you must be able to programmatically access the data. A software module capable of reading XML documents and providing access to their content and structure is referred to as an XML processor or an XML API. While developers are free to implement their own XML APIs, it is in their best interests to leverage industry-accepted standard APIs. By accepting an industry standard API, a developer can write code for a given API implementation that should be capable of unning under any other compliant implementation of the same API without modifications. There are two main API specifications that have gained popularity among developers today and are striving to become industry standards: the Document Object Model (DOM) and the Simple API for XML (SAX).

34 Building the wrappers Instead of … reinventing the wheel … DOM – Document Object Model defines a platform- and language-neutral interface to the structure of XML documents. This interface allows to dynamically access and update the document. From the specification: DOM provides a standard set of objects for representing XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them.

Download ppt "Looking for a (standard) Common Format for (Quantum)"

Similar presentations

Ads by Google