Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Similar presentations


Presentation on theme: "Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be."— Presentation transcript:

1 Pipeline Basics Jared Crossley NRAO NRAO

2 What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be developed as an extension of a more general and more interactive software system.  One or more programs that perform a task with reduced user interaction.  May be developed as an extension of a more general and more interactive software system.

3 Why use it?  Saves time  Especially with large (repetitive) data sets  Interactive data reduction may take a lot of time (even for an expert)  Consistency  Increased accessibility of a data reduction system  You don’t have to be an “expert” to use a pipeline.  A good learning tool -- with good documentation  Saves time  Especially with large (repetitive) data sets  Interactive data reduction may take a lot of time (even for an expert)  Consistency  Increased accessibility of a data reduction system  You don’t have to be an “expert” to use a pipeline.  A good learning tool -- with good documentation

4 Building a Pipeline: Start simple  Build a pipeline in layers.  The lowest level of the pipeline should still be interactive.  For example:  Level 1: allow the user the specify input parameters needed by the following tasks.  Level 2: find the best default parameter values for most data sets.  Given these default values, most data can be processed with little interaction.  Focus on a subset of input data.  Build a pipeline in layers.  The lowest level of the pipeline should still be interactive.  For example:  Level 1: allow the user the specify input parameters needed by the following tasks.  Level 2: find the best default parameter values for most data sets.  Given these default values, most data can be processed with little interaction.  Focus on a subset of input data.

5 Building a Pipeline: continued  The pipeline will evolve with time  Parameter dependencies will reveal themselves  Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline.  Acquire metadata when possible. This can be used to initialize parameters.  The pipeline will evolve with time  Parameter dependencies will reveal themselves  Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline.  Acquire metadata when possible. This can be used to initialize parameters.

6 Areas of concern 1.How much control should the user be given?  Depends on the target audience. Experts want more control than novices.  A compromise is lots of controls, but most of them pre-set to good initial conditions. 1.How much control should the user be given?  Depends on the target audience. Experts want more control than novices.  A compromise is lots of controls, but most of them pre-set to good initial conditions.

7 Areas of concern 2.How many output diagnostics should the pipeline produce?  Varies by processing goal and user preference.  If possible, include a pipeline parameter determines the amount of diagnostics. 2.How many output diagnostics should the pipeline produce?  Varies by processing goal and user preference.  If possible, include a pipeline parameter determines the amount of diagnostics.

8 More on Output  In addition to the primary output product, consider outputting calibrated data and log files.  This allows advanced users to build upon what the pipeline has done  And, this allows for quick “upgrades” to data products.  In addition to the primary output product, consider outputting calibrated data and log files.  This allows advanced users to build upon what the pipeline has done  And, this allows for quick “upgrades” to data products.

9 Validating Output  This is job is necessarily interactive.  However, a pipeline can simplify the process by…  Providing an easy way to view output, including diagnostics  And an easy way to delete (or flag) unacceptable output.  This is job is necessarily interactive.  However, a pipeline can simplify the process by…  Providing an easy way to view output, including diagnostics  And an easy way to delete (or flag) unacceptable output.

10 The VLA (AIPS) Pipeline

11 DescriptionDescription  The pipeline is a script (AIPS run file) that automates  Editing,  Calibration,  And Imaging of VLA continuum data. May also process spectral line data.  Emulates an AIPS task  Takes input parameters  Outputs images and calibration plots  Suggested default parameters contained in AIPS memo.  The pipeline is a script (AIPS run file) that automates  Editing,  Calibration,  And Imaging of VLA continuum data. May also process spectral line data.  Emulates an AIPS task  Takes input parameters  Outputs images and calibration plots  Suggested default parameters contained in AIPS memo.

12  To use the AIPS pipeline: load data into AIPS; split out different frequencies. Demo: VLA (AIPS) Pipeline

13  Set the VLARUN input parameters. Demo: VLA (AIPS) Pipeline Flagging control Pause during calibration Diagnostic plots Imaging control Self-cal (fragile)

14  Image output by pipeline (axes and wedge added) Demo: VLA (AIPS) Pipeline

15 Demo of VLA Pipeline System: ( Imaging the VLA Archive)

16 DescriptionDescription  The VLA Pipeline System is an extension of the AIPS pipeline.  Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation  The VLA Pipeline System is an extension of the AIPS pipeline.  Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation

17  At a high level of pipeline automation, initial user interaction takes place only on the command line.  The user can query the raw data archive via a Perl script:  At a high level of pipeline automation, initial user interaction takes place only on the command line.  The user can query the raw data archive via a Perl script: Demo: VLA Pipeline

18  Next, select data files for download and filling. Demo: VLA Pipeline Select files Download

19  A Unix shell script waits to be called by cron. Demo: VLA Pipeline Start AIPS Execute AIPS Pipeline

20  After processing, the output is archived via scripts invoked by cron.  The data is now available online.  The final step is image validation…  After processing, the output is archived via scripts invoked by cron.  The data is now available online.  The final step is image validation… Demo: VLA Pipeline

21  A web-based validation tool allows for validation. Demo: VLA Pipeline

22  Images and diagnostics can be viewed together and flagged for removal. Demo: VLA Pipeline

23 For more info  About AIPS Pipeline (VLARUN):  AIPS Memo 112, by L. Sjouwerman. http://www.aips.nrao.edu/aipsmemo.html http://www.aips.nrao.edu/aipsmemo.html  VLARUN “online” documentation. From the AIPS prompt type explain VLARUN  About Pipeline System and NVAS:  See the NVAS web page. http://www.aoc.nrao.edu/~vlbacald http://www.aoc.nrao.edu/~vlbacald  For data acquisition scripts, see J. Crossley’s web page. http://www.aoc.nrao.edu/~jcrossle/ http://www.aoc.nrao.edu/~jcrossle/  About pipeline basics:  See notes on J. Crossley’s web page.  About AIPS Pipeline (VLARUN):  AIPS Memo 112, by L. Sjouwerman. http://www.aips.nrao.edu/aipsmemo.html http://www.aips.nrao.edu/aipsmemo.html  VLARUN “online” documentation. From the AIPS prompt type explain VLARUN  About Pipeline System and NVAS:  See the NVAS web page. http://www.aoc.nrao.edu/~vlbacald http://www.aoc.nrao.edu/~vlbacald  For data acquisition scripts, see J. Crossley’s web page. http://www.aoc.nrao.edu/~jcrossle/ http://www.aoc.nrao.edu/~jcrossle/  About pipeline basics:  See notes on J. Crossley’s web page.


Download ppt "Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be."

Similar presentations


Ads by Google