Presentation is loading. Please wait.

Presentation is loading. Please wait.

Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Similar presentations

Presentation on theme: "Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian."— Presentation transcript:

1 Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna –Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara –Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse –José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona – Elda Rossi, Andrew Emerson – CINECA –Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna –Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara –Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse –José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona Computational Chemistry Motivation Vocabolary wrappers

2 Motivation for the work To build a meta-system for supporting research collaboration in the field of Localised Orbitals in post-SCF methods … Linear Scaling methods in a Multi-Reference context Motivation Vocabolary wrappers

3 The scenario Different laboratories need to collaborate Different home-made codes need to be used together since they give different views of the same problem General purpose basic codes needed to pre-compute data in a sort of pipeline Programmes should remain on their original sites under the responsibility of their authors Different platforms Network connections (grid architecture) Workflow Motivation Vocabolary wrappers

4 The need of a Common Format The first problem we faced: How different codes (on different platforms) can communicate we need a Common Format for (at least) Quantum Chemistry codes Motivation Vocabolary wrappers

5 Preliminary steps Looking around … oCML available since long time oXML is use by Accelrys for internal filesAccelrys oXML is used by ArgusLab for internal filesArgusLab All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties XML seems the best technology so we took the decision to try another XML based format HDF5 looked nice for storing large binary data typical of QC Motivation Vocabolary wrappers

6 Why XML Pro Pro:Standard Self referencing Extensible Some experience (CML) already exists Con Con: (Verbose) Very few applications to science Seems difficult to be used with Fortran apparently no specific data types for quantum mechanical concepts or data-types (i.e. integrals, wavefunction, orbitals, coefficients, energy levels, etc.) Motivation Vocabolary wrappers

7 Which codes to be integrated: the meta-system building blocks CAS-DI (Multi-Reference Configuration Interaction) EPCISO (Spin-orbit Configuration Interaction) NEVPT (MR PT, P-Variational approaches) LOCNAT (Localized Multireference algorithm) FCI ( Full Configuration Interaction) PROP ( Property Calculation) COLUMBUS (General ab-initio electronic package) DALTON (General ab-initio package) MolPRO, MolCAS, Nwchem, Gaussian,… Motivation Vocabolary wrappers

8 IN-wrapper OUT-wrapper Program IN-files OUT-files Data Repository Data Repository XML/HDF Leaves the program unchanged One wrapper for each program – If a code is added only one wrapper to be written How should work the engine Motivation Vocabolary wrappers

9 Data repository Input tool code1 In files wrapper Out files wrapper code2 code3 1.Repository 2.Wrappers 3.Workflow engine Motivation Vocabolary wrappers

10 QCML: an XML format for QC In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities As a first approximation three domains can be identified Base FACTS initial data for describing the physics of the system DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …) W-FLOW which codes are in the pipeline, specific input Parameters data, … A base fact is a fact that is a given in the world and is remembered (stored) in the system. A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions. Motivation Vocabolary wrappers

11 FACT: molecule groupName/> –FACTS –DERIVED –W-FLOW Symmetry: group name & other symmetry data Geometry: only cartesian, full or unique for sym Basis: by name or fully defined Motivation Vocabolary wrappers

12 FACT: molecule/symmetry –FACTS –DERIVED –W-FLOW Motivation Vocabolary wrappers

13 FACT: molecule/basis –FACTS –DERIVED –W-FLOW Name: vdz Alias: molpro:vdz/molcas:Vdz/g03:V-dz/… Type: spherical definedFor: C+O+H+… This archive could be organised in XML form, or better use already available data banks (EMSL Basis Set Library) atomBase: H (max angularMom, numPrimitives, numAO) angularMom: s (numAO) orbital: 1 st (numPrimitives) exponents: coefficients: orbital: 2 nd … angularMom: p … atomBase: O atomBase: C Motivation Vocabolary wrappers

14 DERIVED data: computedData –FACTS –DERIVED –W-FLOW A schema has been written for QCML Motivation Vocabolary wrappers

15 DERIVED : computedData/file Two possible strategies: 1.Leave data in their native format and translate them only when needed. Maintain different version (formats) of the same data 2.Define a standard format for binary data and convert them anyway Problem with large binary datasets include the reference not the actual data The second was the solution of choice HDF5 appears to be a good solution Motivation Vocabolary wrappers

16 HDF Mission To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery. Format and software for scientific data Stores images, multidimensional arrays, tables, etc. Emphasis on storage and I/O efficiency Free and commercial software support Emphasis on standards Users from many engineering and scientific fields Motivation Vocabolary wrappers

17 / (root) /foo Example HDF5 file Raster image palette 3-D array 2-D array Raster image lat | lon | temp ----|-----| | 23 | | 24 | | 21 | 3.6 Table Motivation Vocabolary wrappers

18 /MO / (root) /AO Example HDF5 file Orb | occ | energy ----|-----| | 0 | | 0.5| | 2. | 0.69Table /MO Kinetic Overlap Repulsion Kinetic+ Repulsion Property /bi/mono 4-D array /bi /mono /coefficients Motivation Vocabolary wrappers

19 HDF Software HDF file File or other data source Application Programming Interfaces Low-level Interface } HDF I/O library – High-level, object-specific APIs. – Low-level API for I/O to files, etc. General Applications Utilities and applications for manipulating, viewing, and analyzing data. Motivation Vocabolary wrappers

20 HDF file structure for QC Root AO + MO + coeff(i,j) Property Name QCML_ref Norb Spin Polar.: Orb Classif:Core Active Virtual Orb Energies: Orb Symm: [1-order] + format metadata (integer, binary, Endian-ism, …) Motivation Vocabolary wrappers

21 workflow parameters –FACTS –DERIVED –W-FLOW Future work (web-services, bottom-up approach (top- down?), Each code must be divided into elementary recipes (catalog of recipes) The interface of each recipe must be described (idl, xml, …) in addition to more dynamical informations A master of cerimony must exist with the following tasks: ORB-like Inter-client communication Job planning User inteface (grid abstraction) ….

22 QCML processing: wrappers One couple of wrappers for each code in the metasystem They should be written & maintained by the authors of the chemical codes XML processing can be used (DOM) but … what language??? oFortran: no easy and stable DOM available oScripting languages (Perl/Python/Java): not known by chemists We tried both ways (Fortran & Python) We tried both ways (Fortran & Python) Motivation Vocabolary wrappers

23 Fortran DOM: drawbacks The only problem is the Fortran binding oIt doesnt exist (at least last year …) oDOM is OO and Fortran is not It exists a C binding (Gdome2) Gdome2 was installed – very hard work – on a mainframe platform (it was conceived for Linux) We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …) Motivation Vocabolary wrappers

24 Why FortranGOOD Users don't need to learn a new language Homogeneous environment BAD Tricky: need an external library (f77xml) built on top of gdome2 Porting problems for gdome2/libxml2 may arise Motivation Vocabolary wrappers

25 F77xml library Still in development ov0.4 is out (experimental, with limited features) ov1.0 upcoming, API changed to be nearly DOM2 compliant Written in C on top of gdome2 Designed for interfacing to F77 (also F90 soon) Reduced namespace pollution Cons : F77 syntax is difficult (DOM2 + tricks) F90 syntax is simpler A pre-processor will convert F90 syntax to F77 Motivation Vocabolary wrappers

26 F77xml library - V1.0 example GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc); Call f77xml_el_firstChild(nodeCode, elemCode, exc) First position: Return value NodeCode, elemCode,exc mapped to INTEGER Gdome2 (C) F90 F77 Func='el_firstChild' Call xp3t1(nodeCode,func,elemCode,exc) Multiplexer function: x : p3 : 3 parameters (+ name function) t1 : type 1 parameter schema (code/code/error) Motivation Vocabolary wrappers

27 Why PythonGOOD Very Easy Object Oriented Language Works well with strings Simple ed efficient DOM interface for XML Present in almost all UNIX/LINUX distributionBAD Users do need to learn a new language Maybe less powerful than Perl Usually not used by chemists Motivation Vocabolary wrappers

28 Python Wrapper At the present a prototype does work with molpro-fci chain. It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI) Motivation Vocabolary wrappers

29 Wrappers in the future We have to develop a script to write initial XML file Any wrappers should be able to take information from output and append them in XML file User interface could be done with a GUI using TkInter a package integrated in Python Motivation Vocabolary wrappers

30 Python or not Python is very simple to learn and works very efficiently with xml Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade Possibility of a GUI could make our project much more user-friendly Motivation Vocabolary wrappers

31 What we have done … Single platform: IBM SP4 Two code chains MolPro to FCI MolPro to CasDI MolPro FCIDUMP QCML Repository HDF5 Repository OUT-wrapper IN-wrapper Bin file for FCI FCI IN-wrapper MolPro IN-file FCI IN-file Start here Stop here

32 In conclusion … Two important hints on data… 1.Use some XML dialect for describing simple structured data 2.Use HDF5 for storing large array and binary data Need of a good and easy API to XML & HDF How to manage the workflow How to manage the grid connection

33 XML processor A set of rules and interfaces to interact with XML data using a user- program There are two main API specifications done by the w3c consortium: oDOM: Document Object Model oSAX: Simple API for XML

34 Building the wrappers Instead of … reinventing the wheel … DOM – Document Object Model defines a platform- and language-neutral interface to the structure of XML documents. This interface allows to dynamically access and update the document. From the specification: DOM provides a standard set of objects for representing XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them.

Download ppt "Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian."

Similar presentations

Ads by Google