Presentation is loading. Please wait.

Presentation is loading. Please wait.

The ARCS Data Analysis Software Michael Aivazis California Institute of Technology.

Similar presentations


Presentation on theme: "The ARCS Data Analysis Software Michael Aivazis California Institute of Technology."— Presentation transcript:

1 The ARCS Data Analysis Software Michael Aivazis California Institute of Technology

2 2 Fractals in software “Drip programming” –may generate aesthetically interesting flow charts –but it is not a desirable practice Advanced technology may actually complicate matters –complex data structures –objects –user interfaces –multiple platforms –distributed computing –high performance computing –security –… –the Grid Pollock’s “Autumn Rhythm” … or Michael’s framework?

3 3 Software Roadmap

4 4 Design directions Integrate analysis modules using scripting –Python Data flow paradigm –Well understood –Easy to implement and document Meta-data in XML –fully reproducible description of the data analysis pipeline –tag and archive data –record the version number of each module used in the analysis Enable distributed computing –XMLRPC, SOAP, … File formats: NeXus + XML meta-data –Reuse, reuse, reuse –Augment, contribute –HDF5!

5 5 Account for incident flux Remove background Convert from time to energy Correct for detector efficiency Bin into rings of constant scattering angle Convert from angle to momentum Subtract multi-phonon and multiple scattering Correct for absorption Data rebinning C++ Python

6 6 Rebin Write HDF file Sq. rt errs errors 2 errors energies counts in energy Subtract background Read HDF file Rebin filename Read HDF file raw counts Spect. Info times Subtract background Rebin data errors 2 times Spect. Info num_e e_min e_max e_i t_min t_max From TOF to energy

7 7 Flexibility through the use of scripting Scripting enables us to –Organize the large number of parameters –Allow the analysis environment to discover new capabilities without the need for recompilation or relinking The python interpreter –The interpreter modern object oriented language robust, portable, mature, well supported, well documented easily extensible rapid application development –Support for parallel programming trivial embedding of the interpreter in an MPI compliant manner a python interpreter on each compute node MPI is fully integrated: bindings + OO layer –No measurable impact on either performance or scalability

8 8 Writing python bindings Given a “low level” routine, such as and a wrapper double arcs::add(double a, double b); PyObject * arcs_add(PyObject *, PyObject * args) { double a, b; int ok = PyArg_ParseTuple(args, “dd”, &a, &b); if (!ok) { return 0; } double result = arcs::add(a,b ); return Py_BuildValue(“d”, result); } c = arcs.add(2, 2) one can place the result of the routine in a python variable The general case is not much more complicated than this

9 9 Distributed services Workstation ServicesCompute nodes analysis journal monitor component1 component2

10 10 IRIS Explorer

11 11 Data flow paradigm appears natural –usability problems are focused on knowledge of what is possible –used by many commercial and open source tools Improvements –decouple UI from diagram logic –interface use OpenGL! collaborative interesting and relevant research –diagram logic thin, reusable component scripting multi-layered control –development can use existing solutions as a guide of what not to do –many modules already available in pyre –enable distributed programming Target for prototype: early 2004 Visual Programming Environment

12 12 Client Remote Server Database Server Beowulf Cluster An open standard for remote procedure calls Allows us to perform the computation – where the data lives –independently of the local computing capacity Security is an issue XMLRPC: Enabling distributed computing

13 13 Application capabilities –depend on the remote server –exported to the client Boxes represent –data sources –computational modules Wires represent –data flows –control Boxes have input and output ports where wires can be attached Prototype User Interface

14 14 Data Analysis Execution User hits “Run” Applet interprets wiring diagram as XMLRPC commands Server receives commands,arranges Python script, and data processing commences.

15 15 User interface prototypes - I

16 16 User interface prototypes - II

17 17 User interface prototypes - III

18 18 MATLAB If you must… Fully accessible from Python Support involves converting result of data analysis into MATLAB native arrays

19 19 Software engineering practices Version control –Provides a record of the evolution of the software –CVS: well supported, open source Configuration management –Uniform, portable build procedure –Automatic, regular builds of the entire software base –config: a system based on make –merlin: a python-based replacement under development Regression testing –Test cases that Exercise expected behavior Exercise fixes for known bugs Bug tracking –Organize the “to do” list, the feature requests … and the known defects –Gnats: well supported, open source

20 20 SNS - Caltech Interactions Coordination of software projects at Caltech and SNS? Scope and scope management. Expectations of users for software support by SNS and instrument scientists. Consistency of GUI for SNS instruments? Single web portal? Who maintains the code? Standards for maintainable code with "open source coalition"? Issues with distributed computing. Lab policies, security, graphics, user permissions. Issues with releasing software to run on users' machines Status of storage and archiving of raw data by SNS? Institutional arrangements with the ORNL supercomputing center? On-line control of a neutron spectrometer. Technical and policy issues.


Download ppt "The ARCS Data Analysis Software Michael Aivazis California Institute of Technology."

Similar presentations


Ads by Google