Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared.

Similar presentations


Presentation on theme: "Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared."— Presentation transcript:

1 Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared Processing and Analysis Center, Caltech, USA Ewa Deelman Information Sciences Institute, University of Southern California, USA Anastasia Alexov Astronomical Institute Anton Pannekoek, Amsterdam, Netherlands Presentation at AHM 2010, Cardiff, September 2010.

2 The Role of IPAC in Astronomy http://www.ipac.caltech.edu Long-term archive  Curation of data  Dissemination to the community

3 Size and Usage Have Grown  Archives contain data from 30 missions and projects  Space based, ground based and knowledge based Archives Built on a Common Hardware And Software Architecture  85 million queries  3 TB/month downloaded

4 A Common Software Architecture Application is usually a CGI program  Each component is a module with a standard interface that communicates with other components and fulfills one general function  Modules are stand-alone portable ANSI- C tools  Components plugged together & controlled by an executive library  Executive starts components as child services and parses return values Application is usually a CGI program  Each component is a module with a standard interface that communicates with other components and fulfills one general function  Modules are stand-alone portable ANSI- C tools  Components plugged together & controlled by an executive library  Executive starts components as child services and parses return values  Applications are generally simple web forms or Web services that search for data  The “smarts” are on the server side; optimize complex queries on large data sets  Component based architecture which enables strong re-use and adaptation  Optimized for astronomical spatial searches and complex, general queries regardless of wavelength and type of mission  All services are integrated into the Infrared Science Information System (ISIS)  Components are generic; minimize dependencies on third-party software or environments  Avoid shared memories or system calls  All database queries are performed in one module  300 KLOC  New projects automatically inherit functionality  Supports efficient development and controls maintenance costs

5 Engage Your Users!  Concerted program of user engagement to attract new users and build a user community Method User Surveys End User Group (drawn from the community) Exhibits and demos Coffee pot conversations Advertize in newsletters Number of end-users has increased to 18,000 12% of peer-reviewed papers cited IPAC archives or data  Actively seek feedback, e.g.  Watch users as they try services; see where they get stuck  User Surveys ask respondents to write down their views rather than answer questions

6 Listen to the advice you don’t want to hear

7 Speed Is King In An Archive  Image data sets becoming very large: Spitzer Space Telescope will deliver over 100 million images, with varying footprints on the sky.  Searches for spatially extended images are slow: a scan of Spitzer images can take 2,000 s  … results pages are becoming more complex.  What matters more – fast access? Or interactivity? Speed won hands down.

8 R-tree Indexing  Uses hierarchically nested minimum bounding boxes  Performance scales as log(N)  Performance gain of x1000 over table scan  Memory-mapped files  Parallelization / cluster processing  REST-based web services Segment of virtual memory is assigned a byte for byte correlation with part of a file.

9 Modernization of Scanpi  Written in 1983, Scanpi co-adds scans from the far-infrared IRAS survey. 15 papers per year on average by 2007.  Sensitivity gain of x5 over survey data products  Improve spatial resolution of extended or confused sources  User panel strongly recommended modernization because of its value in supporting interpretation of data from current IR missions Spitzer and Herschel.  But it was coughing up blood and was a classic legacy program  Written in F66, it had become a patchwork of scripts and bug fixes and was a maintenance nightmare.  Dependent modules for data compression etc. no longer supported.  Stranded on Solaris 2.8  Developer retiring

10 Scanpi Workflow Co-register scans Co-add all scans Re-usable Components plotting background table manipulation bulk download coordinate transformation Source fitting Back- ground fitting Output: Results and files on Web Get scans Input: Source info  Rewritten from ground up in C  Developed as a workflow application that gives visibility into the processing steps  Calls existing components, reduce code base to 21 KLOC cf. 102 KLOC  1.25 FTE development cf. 0.5 FTE for maintenance  Rewritten from ground up in C  Workflow gives visibility into processing  21 KLOC cf. 102 KLOC  1.25 FTE development cf. 0.5 FTE for maintenance

11 The Montage Image Mosaic Engine Montage (http://montage.ipac.caltech.edu) creates science-grade image mosaics from multiple input images. Broadband simulates and compares seismograms from earthquake simulation codes. Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a reference genome. Montage Workflow ReprojectionBackground Rectification Co-addition Output Input Montage Workflow ReprojectionBackground Rectification Co-addition Output Input Montage Workflow ReprojectionBackground Rectification Co-addition Output Input Montage Workflow (http://montage.ipac.caltech.edu) ReprojectionBackground Rectification Co-addition Output Input  Creates science-grade image mosaics  Scalable, modular design  ANSI-C code (300 MB) runs on all common *nix platforms – desktops, clusters, grids and supercomputers.  Processes 40 million 2MASS pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster  Creates science-grade image mosaics  Scalable, modular design  ANSI-C code (300 MB) runs on all common *nix platforms – desktops, clusters, grids and supercomputers.  Processes 40 million 2MASS pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster

12 How Is It Used?  Science Analysis  Support Production of Data Sets, Data Products and Preview Products  Incorporate into Workflows and Pipelines  Spitzer Space Telescope teams  Quality Assurance of data products  5,000 downloads by bona-fide astronomers  Users now contributing to the project  Scripts for generating mosaics  Python front ends  MPI version Contributed Script (Dr. Inseok Song)

13 Development of Cyber Infrastructure  Task scheduling in distributed environments (performance focused)  Designing job schedulers for the grid  Designing fault tolerance techniques for job schedulers  Exploring issues of data provenance in scientific workflows  Exploring applicability of scientific applications running on Clouds  Developing high-performance workflow restructuring techniques  Developing application performance frameworks  Developing workflow orchestration techniques Cost of running workflows on Amazon EC2 cloud

14 Best Practices for Software Sustainability  Design for sustainability, extensibility, re-use and portability  Build an engaged user community that encourages users to contribute to sustainability  Be careful about new technologies – do a cost benefit analysis before adopting them  Use rigorous software engineering practices to ensure well- organized and well-documented code.  Control your and manage your interfaces.  Make source code and test and validation data available  ✔ ✔ ✔


Download ppt "Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared."

Similar presentations


Ads by Google