Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.

Similar presentations


Presentation on theme: "Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester."— Presentation transcript:

1 Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester

2  Load the ‘EMBL-EBI ClustalW-Soap’ workflow from myExperiment (workflow ID: 1768)  This workflow is asynchronous. This means that when you submit data to the ‘runInterproScan’ service, it will return a jobID and place your job in a queue (this is very useful if your job will take a long time!)  The ‘Status’ nested workflow will query your job ID to find out if it is complete  When it is complete, the ‘get_results’ service will retrieve the results. We used a similar workflow in the looping exercise. Many services from the EBI use the same pattern of execution. We can therefore use this as a template for others

3 Create a new asynchronous workflow using the NCBI BLAST service provided by the EBI (details from the BioCatalogue here http://www.biocatalogue.org/services/1930) http://www.biocatalogue.org/services/1930 You will first have to import the new WSDL into Taverna using the.wsdl link in the BioCatalogue  Use the documentation in the BioCatalogue and associated links to find out what parameters and inputs you need to provide  Use the structure of the Interproscan workflow to work out how to connect the services  Save your workflow

4 BioMart enables the retrieval of large amounts of genomic data e.g. from Ensembl and Sanger, as well as Uniprot and MSD datasets In Taverna, we have a special service type for configuring BioMart queries  Open the workflow ‘BiomartAndEmbossDisease.xml’ from myExperiment  Run the Workflow

5 This Workflow finds all genes on chromosome 22 implicated in known diseases and with homologous genes in rat and mouse (using Ensembl). For each gene, it performs a multiple alignment of the sequences using the EMBOSS tool 'emma' (a wrapper around ClustalW). It then returns PNG images of the multiple alignment along with three columns containing the human, rat and mouse gene IDs used in each case.

6  Click on the ‘hsapiens_gene_ensembl’ service in the diagram. It is automatically selected in the workflow explorer  Click on ‘Details’ at the top of the workflow explorer and select ‘configure’. The BioMart configuration window will appear

7  By selecting ‘Filters’ and then ‘Region’ – change the chromosome from 22 to 21 – now the workflow will retrieve all disease genes from chromosome 21 with rat and mouse homologues  Run the workflow and look at the results  See how some of the other options were configured by finding them in the other pull-down lists (Gene, Multi- species comparison etc)

8 Find out which Gene Ontology terms are associated with the genes in your region by adding a new BioMart query processor  Select another copy of ‘hsapiens_gene_ensembl’ from the services panel (Hint: you could search for hsapiens) and drag it into your workflow, or you could even copy the service already in your workflow  The configuration window will automatically pop-up

9  Configure the new service. In ‘filters’, select ‘gene’ and the ‘id list limit’ tick-box next to ‘ensembl gene IDs’. This will enable you to connect it to the existing workflow  Configure the output (by selecting attributes) and select ‘External’  Find the GO section and select ‘GO Term Accession’ and ‘GO Term Definition’

10  Connect the input of the new service to the ‘hsapiens_gene_ensembl’ service via the ‘ensembl_gene_id’  Create 2 new workflow outputs, ‘GOID’ and ‘Description’. Connect the outputs of the BioMart processor to them  Save the workflow  Re-run the workflow and view which GO terms are associated with your chromosomal region  NOTE: Having 2 outputs for related terms like this is inefficient and hard to read – either add a shim to improve the output format, or change the output settings in the BioMart window

11 Soaplab services look like ordinary WSDL services, but they are also asynchronous services.The structure of soaplab services, however, hides the status checking functions from the users. It was originally designed as a new way of converting Perl scripts and command-line tools to web services. The EMBOSS tool suite from the EBI has been wrapped using Soaplab. We will explore some of these services

12 Reload the BLAST workflow you made in the asynchronous services exercise (if you didn’t save it, you can search for NCBI-Blast on myExperiment) We will extend this workflow to provide protein motif information about the proteins in the BLAST analysis.  In the services panel, find the Emboss soaplab services and find the ‘protein_motifs’ section  Find out which of these services enable searching of the PROSITE and PRINTS databases by looking them up in the BioCatalogue  Import both services into the workflow

13  Connect these services up to the workflow so that you can find PRINTS and PROSITE matches in the query sequence returned from ‘Get Protein Fasta’ – you will see that soaplab services have many input parameters. Hint: The only mandatory fields are the sequence input and a choice of output file  Use the emboss documentation to work out what the other parameters are for

14  The new services should take the output of GetProteinFasta as an input  Run the workflow. Now you have blast results and protein domain/motif matches


Download ppt "Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester."

Similar presentations


Ads by Google