Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow management: motivation and vision

Similar presentations


Presentation on theme: "Workflow management: motivation and vision"— Presentation transcript:

1 Workflow management: motivation and vision
Ela Hunt

2 Plan Overview of existing workflows Gains to be achieved via workflows
Methodological assumptions: how to support and construct workflows with less effort and more effectively Ela Hunt, SyBIT

3 Three areas of workflow use:
Deep sequencing High content screening Proteomics Future: workflows combining those three methodologies, possibly including metabolomics, NMR. etc Ela Hunt, SyBIT

4 Deep sequencing Management of reads (images) coming off the microscopy devices Processing of images into sequence files Aligment to a genome or genome assembly from short reads Annnotation with data from external sources Candidate gene/drug target identification Ela Hunt, SyBIT

5 DeepSequencingWorkflow Status (Lausanne) Possible extensions
1b. Illuminasequencing Sequenceanalysis Sequence data 2. fileserver Perl Meta-data 3. 8. Association Viewer 6. DAS server 4. Submit analysis pipeline 1a. Web – sample metadata capture Web-browse 7. Microbe Browser Ela Hunt, SyBIT

6 Deep sequencing workflow status
Lausanne – alignment via Eland (Emmanuel Beaudoing, Sylvain Pradervand) Basel – under construction (Manuel Kohler) Zurich – FGCZ – under construction (Remy Bruggmann) Ela Hunt, SyBIT

7 Proteomics workflows MS spectra
Mapping to proteins (merging output from various analysis programs) Annotation with additional data ETHZ – Perl scripts and KNIME (Andreas Quandt) Lausanne, Geneva, Basel (?) Ela Hunt, SyBIT

8 ETHZ proteomics example (drawn in KNIME by Andreas Quandt)
Ela Hunt, SyBIT

9 Screening workflows Microscopy, image transfer, compression
Matlab scripts (light intensity adjustment, feature recognition, etc, leading to the identification of features) writing feature counts to a DB/files Stats and chart generation, sometimes including a user interface showing images (also for training), KNIME, R, Matlab, etc Ela Hunt, SyBIT

10 Screening workflows Lausanne – Petr Strnad‘s workflows in KNIME, Matlab, MySQL iBRAIN developed by Berend Snijder - an end-to-end solution with a GUI (shell script, XML, XSLT, HTML) imageJ in S. Maerkl‘s lab in Lausanne, needing more automation and DB HCDC (Postgress, Matlab, KNIME) Ela Hunt, SyBIT

11 Lausanne workflow fragment
Read available plates Loop for every plate… …read cell data for the plate in the loop Calculate the number of centrosomes for 7 different threshold Ela Hunt, SyBIT

12 iBRAIN overview Purpose: plates, wells, images => compress images, classify cells into types, count cells of various types, graph Submit project via drag-and-drop of a file Monitor progress on cluster via HTML pages Technology: bash, Matlab, cluster, XML, HTML, web pages generated from a bash script, paths and file names are embedded Ela Hunt, SyBIT

13 iBRAIN use cases Ela Hunt, SyBIT

14 OUR GOALS: addressing technical challenges
Maintainablility (extendability) of the entire workflow Portability Automation (end-to-end execution) Cost savings via code base sharing Various architectures (storage, clusters) Multiple logins (security, ease of administration) Privacy Most of those can be solved via extending KNIME (next talk) Ela Hunt, SyBIT

15 Extending KNIME: see workflows wiki page
Ela Hunt, SyBIT

16 What is KNIME? A Java workflow management system
Integrates Python, R, Perl, Java snippets, jdbc GUI – can be used by a bioinformatician Also server and cluster products (SunGRID engine) Used at several locations (below P. Strnad‘s at Lausanne) Ela Hunt, SyBIT

17 KNIME Analysis (from P. Strnad)
50% of cells have 2 centrosomes Percentage of cells bellow threshold Usually exclude 10% of cells with low GFP-Centrin signal GFP-Centrin expression threshold

18 KNIME Analysis Cell count Centrosome number

19 Image Regions Viewer

20 Image Regions Viewer

21 Goals of KNIME extension
Maintainablility (extendability) Portability Automation (end-to-end execution) Cost savings via code base sharing Various architectures (storage, clusters) Doing away with multiple logins or no logins (security, ease of administration, privacy) Ela Hunt, SyBIT

22 Security Security – one uname/passw per user, one login that carries out the whole workflow Will include cluster/db logins KNIME – needs the concepts of user/session, login, accounting of who did what Allows for workflow tracking, scientific repeatability, accounting Ela Hunt, SyBIT

23 Distributed data and computation
Data Mover as a KNIME node (expose input params, input and output as KNIME ports) – KNIME abstracts over those, and calls them ports Usage of clusters (LSF and others, as needed) – probably involving the spawning of several Java workflows distributed over a cluster, also reporting of status as jobs are being processed Ela Hunt, SyBIT

24 Language additions Wrapping for Matlab Improved wrapping of Perl
Better facilities for R embedding (viewports) CP2 embedding Sequence: Eland, MAQ, Bowtie, BWA Proteomics: Mascot, Xtandem, OMSSA, SpectraSS Ela Hunt, SyBIT

25 GUI additions Job submission GUI
Job monitoring GUI (to show errors in a manner appropriate for a biological user) Workflow sharing GUI (choose workflow, associate with data) GUI embedding facility for Java GUIs (currently implementation is too fiddly) Ela Hunt, SyBIT

26 Workflow portability A reconfiguration tool, based on the XML workflow description format supported by KNIME, in XPath or Xquery (GUI?): select all data paths and change them select all software paths and change them select db/login/cluster user data, update check the updated values by testing all new parameters, report for two identical workflow instances, report the config differences Ela Hunt, SyBIT

27 Better workflow management
An open repository of workflow nodes, shared by all KNIME user groups (two parts – mature and beta) Saving of graphing parameters, so that an entire workflow can be automated Adding a workflow start node with iteration over directories Data flow efficiency - data exchange between nodes – via hierarchical structures (XML?) and tables (for Perl?) Ela Hunt, SyBIT

28 Image handling Image type improvements (this type is under development and may not be mature yet) Image storage in openBIS (various levels of resolution, by well, plate, etc), with associated indexes, so that stats at various levels can be generated easily Ela Hunt, SyBIT

29 openBIS/B-Fabric connectivity
Access to raw data from KNIME Image indexing, so that KNIME can effectively query features Analysis results storage Dumping of workflow run parameters/outcomes to DB (maybe picking up a workflow from DB) Ela Hunt, SyBIT

30 SQL handling Better table merging (to merge data from several tables, supported by a query definition), as this is cumbersome Ela Hunt, SyBIT

31 Summary KNIME is used in Zurich and Lausanne, but does not provide end-to-end processing List of new requirements was gathered from workflow users An outline grant submitted to KTI Your input is needed! Ela Hunt, SyBIT


Download ppt "Workflow management: motivation and vision"

Similar presentations


Ads by Google