Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.

Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human resources –Reduces human error and omissions

Two Pipeline Types Resources pipeline –Downloads resources from external sources –Loads resources into database –Example: NRDB files Analysis pipeline –Extract data from database –Run analysis programs on data On main or cluster server –Put value added data back into database

Resource Pipeline Invoked by: –loadresources xmlfile propfile Take a tour of a resources XML file

Resources Repository Destination of downloads Houses files in a file system Serves as a cache for files Has API to access files by name and version If you request an existing file by name and version, repository returns it without downloading –But the wget arguments must match (these are remembered by the repository) Particularly useful if multiple projects want to synchronize their data input

Analysis Pipeline Take a tour of the analysis pipeline file Take a tour of the Steps.pm file Take a tour of the property file (there’s also one for the resource pipeline

Pipeline Directory Structure The directory which houses all the information for the pipeline including: –Input data –Logs –Result data –Pipeline control information: Which steps have been completed Property files to control cluster Structured for easy comprehension Take a tour of the directory structure

Analysis Pipeline API GUS::Pipeline::Manager.pm –Declares properties –Prevents steps from rerun –Calls plugins –Executes commands –Eases communication with cluster GUS::Pipeline:: MakeTaskDirs.pm –Helps make directories expected by distribjob on the cluster GUS::Pipeline:: TaskRunAndValidate.pm –Helps run a series of tasks on the cluster

DJob Manages the distribution of tasks across a compute cluster Handles the case of a very large number of inputs which are processed independently and uniformly For example, blasting a set of EST against a genome Now available for clusters using PBS cluster scheduler http://core.pcbi.upenn.edu/tools/liniactools.html

Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.

Similar presentations

Presentation on theme: "Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.

Similar presentations

Presentation on theme: "Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human."— Presentation transcript:

Similar presentations

About project

Feedback