Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Slides:



Advertisements
Similar presentations
1 Microsoft Access 2002 Tutorial 9 – Automating Tasks With Macros.
Advertisements

Intro to ArcMap Customization with Visual Basic  Create your own toolbars, buttons, interactive tools, and programs  Runs behind the scenes in ArcMap.
ArcView and GMT – An Introduction to Two Simple GIS Systems Bill Langin EAS 781 9/18/02.
Geography 465 Overview Geoprocessing in ArcGIS. MODELING Geoprocessing as modeling.
2007 Adobe Systems Incorporated. All Rights Reserved. 1 Joe Berkovitz VP Engineering Allurent, Inc. Continuous Integration with Flex, FlexUnit, and Ant.
PROJECT IN COMPUTER SECURITY MONITORING BOTNETS FROM WITHIN FINAL PRESENTATION – SPRING 2012 Students: Shir Degani, Yuval Degani Supervisor: Amichai Shulman.
Second edition Your UNIX: The Ultimate Guide Das © 2006 The McGraw-Hill Companies, Inc. All rights reserved. UNIX – Shell Programming The activities of.
An overview of the electronic work permit system in use at the ISAC facility ISAC Electronic Work Permit System Rob Shanks, TRIUMF, Vancouver CANADA,
Introduction to UNIX/Linux Exercises Dan Stanzione.
Advanced File Processing
MarcEdit Basics and Beyond By Mary Aycock Head, Catalog Department Missouri University of Science and Technology MOBIUS 2012 Conference.
OCLC Online Computer Library Center CONTENTdm Migration Training Craig Yamashita Vice President, Technology and Product Development DiMeMa, Inc. July 2005.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
Hunt for Molecules, Paris, 2005-Sep-20 Software Development for ALMA Robert LUCAS IRAM Grenoble France.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
GDT V5 Web Services. GDT V5 Web Services Doug Evans and Detlef Lexut GDT 2008 International User Conference August 10 – 13  Lake Las Vegas, Nevada GDT.
Chapter 9 Scripting RMAN. Background Authors felt that scripting was a topic not covered well Authors wanted to cover both Unix/Linux and Windows environments.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Winrunner Usage - Best Practices S.A.Christopher.
UWG 2013 Meeting PO.DAAC Web Services Demo. What are PO.DAAC Web Services?
Best Practices for Script Design A PowerShell.org TechSession.
Adobe Flash CS3 Revealed Chapter 1 - GETTING STARTED WITH FLASH.
GNU Compiler Collection (GCC) and GNU C compiler (gcc) tools used to compile programs in Linux.
Why do I want to know about HDF and HDF- EOS? Hierarchical Data Format for the Earth Observing System (HDF-EOS) is NASA's primary format for standard data.
Introduction of Geoprocessing Topic 7a 4/10/2007.
ALMA Software B.E. Glendenning (NRAO). 2 ALMA “High Frequency VLA” in Chile Presently a European/North American Project –Japan is almost certainly joining.
Guide to Linux Installation and Administration, 2e1 Chapter 7 The Role of the System Administrator.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.
Enhancing - Vuser Scripts In HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
VLA Archive Image Pilot Pilot to create images from VLA archive data Loránt Sjouwerman National Radio Astronomy Observatory.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
EVLA Software Bryan Butler. 2007May22EVLA SAGE Meeting2 Requirements and Goals of EVLA Software Maximize scientific throughput of the instrument At a.
The Metadata Tool Custom Metadata Tool Who this tool is for: This tool designed to be used a data management system. This tool is geared more for the.
Lesson 1 Operating Systems, Part 1. Objectives Describe and list different operating systems Understand file extensions Manage files and folders.
Crossmatch: the NRAO Cross-identifcation Service Jared Crossley Data Archive Access Meeting - Socorro, NM - 5 April 2010.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV.
By Paul Richard and Jim Fitzgerald Chapter 18 – Drawing Management Tools and Utilities.
Esri UC 2014 | Technical Workshop | Creating Geoprocessing Services Kevin Hibma.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
Introduction Selenium IDE is a Firefox extension that allows you to record, edit, and debug tests for HTML Easy record and playback Intelligent field selection.
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
Test Automation For Web-Based Applications Portnov Computer School 1 Selenium HP Web Test Tool Training.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
I can run this simple BAT file to copy files: (this was tried with and without the pause command)
ScriptOnce™ & Best Practices. Agenda 2 Automation that works ScriptOnce –Minimal maintenance –Easy to add devices Robustness –Reliable Scripts - Minimize.
Introduction of Geoprocessing Lecture 9 3/24/2008.
EValid LoadTest, eV.manger and Validation. Agenda Load Test capability of eValid How to execute load test by using eValid Introduction to eV.manager Validation.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
EVLA Data Processing PDR Pipeline design Tim Cornwell, NRAO.
Wednesday NI Vision Sessions
WfMS and external systems Katarzyna Bylec PSNC. Agenda Introduction Pre-corelation ▫ North Star ▫ NRAO SCHED ▫ Vlbeer FTP ▫ Log2vex ▫ drudg Correlation.
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
The CORNISH data reduction pipeline. Architecture - Overview AIPS Tasks Obit Tasks ObitTalk / Python Control Scripts MySQL Database Book-keeping Raw Data.
BY: SALMAN 1.
Using Crontab with Ubuntu
Product Training Program
BY: SALMAN.
NRAO VLA Archive Survey
SQL and SQL*Plus Interaction
NHSC/PACS Web Tutorials
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Guide To UNIX Using Linux Third Edition
TimeClock Plus v7 Manager Training.
Periodic Processes Chapter 9.
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Pipeline Basics Jared Crossley NRAO NRAO

What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be developed as an extension of a more general and more interactive software system.  One or more programs that perform a task with reduced user interaction.  May be developed as an extension of a more general and more interactive software system.

Why use it?  Saves time  Especially with large (repetitive) data sets  Interactive data reduction may take a lot of time (even for an expert)  Consistency  Increased accessibility of a data reduction system  You don’t have to be an “expert” to use a pipeline.  A good learning tool -- with good documentation  Saves time  Especially with large (repetitive) data sets  Interactive data reduction may take a lot of time (even for an expert)  Consistency  Increased accessibility of a data reduction system  You don’t have to be an “expert” to use a pipeline.  A good learning tool -- with good documentation

Building a Pipeline: Start simple  Build a pipeline in layers.  The lowest level of the pipeline should still be interactive.  For example:  Level 1: allow the user the specify input parameters needed by the following tasks.  Level 2: find the best default parameter values for most data sets.  Given these default values, most data can be processed with little interaction.  Focus on a subset of input data.  Build a pipeline in layers.  The lowest level of the pipeline should still be interactive.  For example:  Level 1: allow the user the specify input parameters needed by the following tasks.  Level 2: find the best default parameter values for most data sets.  Given these default values, most data can be processed with little interaction.  Focus on a subset of input data.

Building a Pipeline: continued  The pipeline will evolve with time  Parameter dependencies will reveal themselves  Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline.  Acquire metadata when possible. This can be used to initialize parameters.  The pipeline will evolve with time  Parameter dependencies will reveal themselves  Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline.  Acquire metadata when possible. This can be used to initialize parameters.

Areas of concern 1.How much control should the user be given?  Depends on the target audience. Experts want more control than novices.  A compromise is lots of controls, but most of them pre-set to good initial conditions. 1.How much control should the user be given?  Depends on the target audience. Experts want more control than novices.  A compromise is lots of controls, but most of them pre-set to good initial conditions.

Areas of concern 2.How many output diagnostics should the pipeline produce?  Varies by processing goal and user preference.  If possible, include a pipeline parameter determines the amount of diagnostics. 2.How many output diagnostics should the pipeline produce?  Varies by processing goal and user preference.  If possible, include a pipeline parameter determines the amount of diagnostics.

More on Output  In addition to the primary output product, consider outputting calibrated data and log files.  This allows advanced users to build upon what the pipeline has done  And, this allows for quick “upgrades” to data products.  In addition to the primary output product, consider outputting calibrated data and log files.  This allows advanced users to build upon what the pipeline has done  And, this allows for quick “upgrades” to data products.

Validating Output  This is job is necessarily interactive.  However, a pipeline can simplify the process by…  Providing an easy way to view output, including diagnostics  And an easy way to delete (or flag) unacceptable output.  This is job is necessarily interactive.  However, a pipeline can simplify the process by…  Providing an easy way to view output, including diagnostics  And an easy way to delete (or flag) unacceptable output.

The VLA (AIPS) Pipeline

DescriptionDescription  The pipeline is a script (AIPS run file) that automates  Editing,  Calibration,  And Imaging of VLA continuum data. May also process spectral line data.  Emulates an AIPS task  Takes input parameters  Outputs images and calibration plots  Suggested default parameters contained in AIPS memo.  The pipeline is a script (AIPS run file) that automates  Editing,  Calibration,  And Imaging of VLA continuum data. May also process spectral line data.  Emulates an AIPS task  Takes input parameters  Outputs images and calibration plots  Suggested default parameters contained in AIPS memo.

 To use the AIPS pipeline: load data into AIPS; split out different frequencies. Demo: VLA (AIPS) Pipeline

 Set the VLARUN input parameters. Demo: VLA (AIPS) Pipeline Flagging control Pause during calibration Diagnostic plots Imaging control Self-cal (fragile)

 Image output by pipeline (axes and wedge added) Demo: VLA (AIPS) Pipeline

Demo of VLA Pipeline System: ( Imaging the VLA Archive)

DescriptionDescription  The VLA Pipeline System is an extension of the AIPS pipeline.  Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation  The VLA Pipeline System is an extension of the AIPS pipeline.  Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation

 At a high level of pipeline automation, initial user interaction takes place only on the command line.  The user can query the raw data archive via a Perl script:  At a high level of pipeline automation, initial user interaction takes place only on the command line.  The user can query the raw data archive via a Perl script: Demo: VLA Pipeline

 Next, select data files for download and filling. Demo: VLA Pipeline Select files Download

 A Unix shell script waits to be called by cron. Demo: VLA Pipeline Start AIPS Execute AIPS Pipeline

 After processing, the output is archived via scripts invoked by cron.  The data is now available online.  The final step is image validation…  After processing, the output is archived via scripts invoked by cron.  The data is now available online.  The final step is image validation… Demo: VLA Pipeline

 A web-based validation tool allows for validation. Demo: VLA Pipeline

 Images and diagnostics can be viewed together and flagged for removal. Demo: VLA Pipeline

For more info  About AIPS Pipeline (VLARUN):  AIPS Memo 112, by L. Sjouwerman.  VLARUN “online” documentation. From the AIPS prompt type explain VLARUN  About Pipeline System and NVAS:  See the NVAS web page.  For data acquisition scripts, see J. Crossley’s web page.  About pipeline basics:  See notes on J. Crossley’s web page.  About AIPS Pipeline (VLARUN):  AIPS Memo 112, by L. Sjouwerman.  VLARUN “online” documentation. From the AIPS prompt type explain VLARUN  About Pipeline System and NVAS:  See the NVAS web page.  For data acquisition scripts, see J. Crossley’s web page.  About pipeline basics:  See notes on J. Crossley’s web page.