The ARCS Data Analysis Software Michael Aivazis California Institute of Technology.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Configuration management
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
MIT Lincoln Laboratory A Service-Oriented Approach to Application Development Robert Darneille & Gary Schorer WPI MQP Presentations ICS Group 10 October.
Tahir Nawaz Introduction to.NET Framework. .NET – What Is It? Software platform Language neutral In other words:.NET is not a language (Runtime and a.
Idaho National Engineering and Environmental Laboratory What is a Framework? Web Service? Why do you need them? Wayne Simpson November.
Introduction To Java Objectives For Today â Introduction To Java â The Java Platform & The (JVM) Java Virtual Machine â Core Java (API) Application Programming.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
Programming System development life cycle Life cycle of a program
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Software Project Brent Fultz California Institute of Technology Issues Specifications Algorithms Web service model Plan for a plan.
Experimental Facilities Division ANL-ORNL SNS Experimental Data Standards (Status) Richard Riedel SNS Data Acquisition Group Leader.
Requirements Analysis 5. 1 CASE b505.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis CASE Computer.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
DANSE Central Services Michael Aivazis Caltech NSF Review May 23, 2008.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
The ARCS Data Analysis Software Michael Aivazis California Institute of Technology.
Interpret Application Specifications
CASE Tools CIS 376 Bruce R. Maxim UM-Dearborn. Prerequisites to Software Tool Use Collection of useful tools that help in every step of building a product.
© , Michael Aivazis DANSE Software Architecture Challenges and opportunities for the next generation of data analysis software Michael Aivazis.
An overview of the DANSE software architecture Michael Aivazis Caltech DANSE Kick-Off Meeting Pasadena Aug 15, 2006.
Chapter 9: Moving to Design
Pyre: a distributed component framework Michael Aivazis Caltech DANSE Developers Workshop January 22-23, 2007.
Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
–Streamline / organize Improve readability of code Decrease code volume/line count Simplify mechanisms Improve maintainability & clarity Decrease development.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Building Applications.
Your Interactive Guide to the Digital World Discovering Computers 2012.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
GRAPPA Part of Active Notebook Science Portal project A “notebook” like GRAPPA consists of –Set of ordinary web pages, viewable from any browser –Editable.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
An Introduction to Software Architecture
Advanced PI Calculation Engine Makes Complex PI Calculations Easy! Use of EDICTvb for Multi-Plant Advanced PI Calculations Dane OverfieldEXELE Information.
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Mantid Scientific Steering Committee Nick Draper 10/11/2010.
DANSE Central Services Michael Aivazis Caltech NSF Review May 31, 2007.
“DECISION” PROJECT “DECISION” PROJECT INTEGRATION PLATFORM CORBA PROTOTYPE CAST J. BLACHON & NGUYEN G.T. INRIA Rhône-Alpes June 10th, 1999.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
1 CMPT 275 High Level Design Phase Modularization.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
© 2013, published by Flat World Knowledge Chapter 10 Understanding Software: A Primer for Managers 10-1.
Mantid Stakeholder Review Nick Draper 01/11/2007.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
INFSO-RI JRA2 Test Management Tools Eva Takacs (4D SOFT) ETICS 2 Final Review Brussels - 11 May 2010.
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
CIS 375 Bruce R. Maxim UM-Dearborn
Introduction to Visual Basic. NET,. NET Framework and Visual Studio
Current Status of the Geometry Database for the CBM Experiment
reduction data treatment for ARCS
Pipeline Execution Environment
Maintaining software solutions
Introduction to Software Testing
Module 01 ETICS Overview ETICS Online Tutorials
An Introduction to Software Architecture
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Presentation transcript:

The ARCS Data Analysis Software Michael Aivazis California Institute of Technology

2 Fractals in software “Drip programming” –may generate aesthetically interesting flow charts –but it is not a desirable practice Advanced technology may actually complicate matters –complex data structures –objects –user interfaces –multiple platforms –distributed computing –high performance computing –security –… –the Grid Pollock’s “Autumn Rhythm” … or Michael’s framework?

3 Software Roadmap

4 Design directions Integrate analysis modules using scripting –Python Data flow paradigm –Well understood –Easy to implement and document Meta-data in XML –fully reproducible description of the data analysis pipeline –tag and archive data –record the version number of each module used in the analysis Enable distributed computing –XMLRPC, SOAP, … File formats: NeXus + XML meta-data –Reuse, reuse, reuse –Augment, contribute –HDF5!

5 Account for incident flux Remove background Convert from time to energy Correct for detector efficiency Bin into rings of constant scattering angle Convert from angle to momentum Subtract multi-phonon and multiple scattering Correct for absorption Data rebinning C++ Python

6 Rebin Write HDF file Sq. rt errs errors 2 errors energies counts in energy Subtract background Read HDF file Rebin filename Read HDF file raw counts Spect. Info times Subtract background Rebin data errors 2 times Spect. Info num_e e_min e_max e_i t_min t_max From TOF to energy

7 Flexibility through the use of scripting Scripting enables us to –Organize the large number of parameters –Allow the analysis environment to discover new capabilities without the need for recompilation or relinking The python interpreter –The interpreter modern object oriented language robust, portable, mature, well supported, well documented easily extensible rapid application development –Support for parallel programming trivial embedding of the interpreter in an MPI compliant manner a python interpreter on each compute node MPI is fully integrated: bindings + OO layer –No measurable impact on either performance or scalability

8 Writing python bindings Given a “low level” routine, such as and a wrapper double arcs::add(double a, double b); PyObject * arcs_add(PyObject *, PyObject * args) { double a, b; int ok = PyArg_ParseTuple(args, “dd”, &a, &b); if (!ok) { return 0; } double result = arcs::add(a,b ); return Py_BuildValue(“d”, result); } c = arcs.add(2, 2) one can place the result of the routine in a python variable The general case is not much more complicated than this

9 Distributed services Workstation ServicesCompute nodes analysis journal monitor component1 component2

10 IRIS Explorer

11 Data flow paradigm appears natural –usability problems are focused on knowledge of what is possible –used by many commercial and open source tools Improvements –decouple UI from diagram logic –interface use OpenGL! collaborative interesting and relevant research –diagram logic thin, reusable component scripting multi-layered control –development can use existing solutions as a guide of what not to do –many modules already available in pyre –enable distributed programming Target for prototype: early 2004 Visual Programming Environment

12 Client Remote Server Database Server Beowulf Cluster An open standard for remote procedure calls Allows us to perform the computation – where the data lives –independently of the local computing capacity Security is an issue XMLRPC: Enabling distributed computing

13 Application capabilities –depend on the remote server –exported to the client Boxes represent –data sources –computational modules Wires represent –data flows –control Boxes have input and output ports where wires can be attached Prototype User Interface

14 Data Analysis Execution User hits “Run” Applet interprets wiring diagram as XMLRPC commands Server receives commands,arranges Python script, and data processing commences.

15 User interface prototypes - I

16 User interface prototypes - II

17 User interface prototypes - III

18 MATLAB If you must… Fully accessible from Python Support involves converting result of data analysis into MATLAB native arrays

19 Software engineering practices Version control –Provides a record of the evolution of the software –CVS: well supported, open source Configuration management –Uniform, portable build procedure –Automatic, regular builds of the entire software base –config: a system based on make –merlin: a python-based replacement under development Regression testing –Test cases that Exercise expected behavior Exercise fixes for known bugs Bug tracking –Organize the “to do” list, the feature requests … and the known defects –Gnats: well supported, open source

20 SNS - Caltech Interactions Coordination of software projects at Caltech and SNS? Scope and scope management. Expectations of users for software support by SNS and instrument scientists. Consistency of GUI for SNS instruments? Single web portal? Who maintains the code? Standards for maintainable code with "open source coalition"? Issues with distributed computing. Lab policies, security, graphics, user permissions. Issues with releasing software to run on users' machines Status of storage and archiving of raw data by SNS? Institutional arrangements with the ORNL supercomputing center? On-line control of a neutron spectrometer. Technical and policy issues.