Presentation is loading. Please wait.

Presentation is loading. Please wait.

The ARCS Data Analysis Software Michael Aivazis California Institute of Technology.

Similar presentations


Presentation on theme: "The ARCS Data Analysis Software Michael Aivazis California Institute of Technology."— Presentation transcript:

1 The ARCS Data Analysis Software Michael Aivazis California Institute of Technology

2 2 Fractals in software “Drip programming” –may generate aesthetically interesting flow charts –but it is not a desirable practice Advanced technology may actually complicate matters –complex data structures –objects –user interfaces –multiple platforms –distributed computing –high performance computing –security –… –the Grid Pollock’s “Autumn Rhythm” … or Michael’s framework?

3 3 Software Roadmap

4 4 Account for incident flux Remove background Convert from time to energy Correct for detector efficiency Bin into rings of constant scattering angle Convert from angle to momentum Subtract multi-phonon and multiple scattering Correct for absorption Data reductions C++ Python

5 5 Rebin Write HDF file Sq. rt errs errors 2 errors energies counts in energy Subtract background Read HDF file Rebin filename Read HDF file raw counts Spect. Info times Subtract background Rebin data errors 2 times Spect. Info num_e e_min e_max e_i t_min t_max From TOF to energy

6 6 Data flow for TOF to Energy conversion

7 7 Design directions Integrate analysis modules using scripting –Python Data flow paradigm –Well understood –Easy to implement and document Meta-data in XML –fully reproducible description of the data analysis pipeline –tag and archive data –record the version number of each module used in the analysis Enable distributed computing –XMLRPC, SOAP, … File formats: NeXus + XML meta-data –Reuse, reuse, reuse –Augment, contribute –HDF5!

8 8 Flexibility through the use of scripting Scripting enables us to –Organize the large number of parameters –Allow the analysis environment to discover new capabilities without the need for recompilation or relinking The python interpreter –The interpreter modern object oriented language robust, portable, mature, well supported, well documented easily extensible rapid application development –Support for parallel programming trivial embedding of the interpreter in an MPI compliant manner a python interpreter on each compute node MPI is fully integrated: bindings + OO layer –No measurable impact on either performance or scalability

9 9 Writing python bindings Given a “low level” routine, such as and a wrapper double arcs::add(double a, double b); PyObject * arcs_add(PyObject *, PyObject * args) { double a, b; int ok = PyArg_ParseTuple(args, “dd”, &a, &b); if (!ok) { return 0; } double result = arcs::add(a,b ); return Py_BuildValue(“d”, result); } c = arcs.add(2, 2) one can place the result of the routine in a python variable The general case is not much more complicated than this

10 10 Pyre Architecture component bindings engine component bindings library infrastructure service framework service component bindings engine abstract class specialization package The integration framework is a set of co-operating abstract services FORTRAN/C/C++ python

11 11 Pyre services journal –flexible control over the generation and delivery of simulation diagnostics from the compute nodes to the workstation monitor –a distributed service for low bandwidth, on the fly visualizations –currently used mostly for status monitoring and debugging timer weaver –a general source code generation facility –support for many languages FORTRAN, C, C++, python, HTML, XML from makefiles to optimized C++ sources –automatic web page creation for cgi scripts –supports user authentication passwords, soon user SSL certificates blade –a toolkit independent UI generator

12 12 Distributed services Workstation ServicesCompute nodes analysis journal monitor component1 component2

13 13 IRIS Explorer

14 14 Data flow paradigm appears natural –usability problems are focused on knowledge of what is possible –used by many commercial and open source tools Improvements –decouple UI from diagram logic –interface use OpenGL! collaborative interesting and relevant research –diagram logic thin, reusable component scripting multi-layered control –development can use existing solutions as a guide of what not to do –many modules already available in pyre –enable distributed programming Target for prototype: early 2004 Visual Programming Environment

15 15 Client Remote Server Database Server Beowulf Cluster An open standard for remote procedure calls Allows us to perform the computation – where the data lives –independently of the local computing capacity Security is an issue XMLRPC: Enabling distributed computing

16 16 Application capabilities –depend on the remote server –exported to the client Boxes represent –data sources –computational modules Wires represent –data flows –control Boxes have input and output ports where wires can be attached Prototype User Interface

17 17 Data Analysis Execution User hits “Run” Applet interprets wiring diagram as XMLRPC commands Server receives commands,arranges Python script, and data processing commences.

18 18 User interface prototypes - I

19 19 User interface prototypes - II

20 20 User interface prototypes - III

21 21 MATLAB If you must… Fully accessible from Python Support involves converting result of data analysis into MATLAB native arrays

22 22 Software engineering practices Version control –Provides a record of the evolution of the software –CVS: well supported, open source Configuration management –Uniform, portable build procedure –Automatic, regular builds of the entire software base –config: a system based on make –merlin: a python-based replacement under development Regression testing –Test cases that Exercise expected behavior Exercise fixes for known bugs Bug tracking –Organize the “to do” list, the feature requests … and the known defects –Gnats: well supported, open source

23 23 Design directions Integrate analysis modules using scripting –Python Data flow paradigm –Well understood –Easy to implement and document Meta-data in XML –fully reproducible description of the data analysis pipeline –tag and archive data –record the version number of each module used in the analysis Enable distributed computing –XMLRPC, SOAP, … File formats: NeXus + XML meta-data –Reuse, reuse, reuse –Augment, contribute –HDF5!


Download ppt "The ARCS Data Analysis Software Michael Aivazis California Institute of Technology."

Similar presentations


Ads by Google