4 Working group-led iteration and discussion (Jan-June) Componentization Reusability Identification of potential GUI representations for work products Summer Supercomputing Institute Meeting (July) Refinement of workflow Identification of entry and exit point Iteration on GUI representations Cyberinfrastructure-oriented design Implementation decisions Technology/language Work allocation How did we get here?
5 Expression Analysis BioConductor limma Retrieve data Specify experiment design Normalize (gcRMA) Linear model fit Bayesian correction Hypothesis testing Emit results NCBI GEO iPlant Data Storage API Limma is a standard module for expression analysis Limma incorporates translation and integration code to handle most common array platforms Limma writes verbose but consistent delimited results People know how to use BioC/Limma and can do so on their desktop systems Entry point is user upload expression file into the iPlant Data API
6 VAPrototype Retrieve data via /data API Iterate over experiments Perform category enrichment Consolidate results Return as JSON data structure http://medea/iplant/js/application.js 1.Invoke VAPrototype via iPlant /jobs API 2.Poll for service to complete 3.Fetch results as JSON 4.Render to dynamic table 5.Interpret user interactions Lecong.cgi Accept gene list Accept control list Accept parameters Run analysis using call to R Return JSON data structure iPlant Jobs API iPlant Data API R/Bioconductor/HyperGO
7 http://medea/iplant/js/application.js 1.Interpret user interactions 1.Sorting 2.Downloading 3.Invoke Network Analysis service via iPlant /jobs API 4.Poll /jobs for completion 5.Fetch results (GraphML) 6.Render in Cytoscape Web BuildNetwork Accept gene list Accept parameters (species, etc) Accept algorithm name (GeneMania) Invoke GeneMania plugin (Java) to predict network Convert all gene names to AGI codes Convert domain-specific report to GraphML iPlant Jobs API iPlant Genome Service API Gene Mania
What’s next 10 VAPrototype won’t see any explicit additional development since it is a proof of principle We need to focus on delivering robust versions of the functions that are mocked up It serves as a reference implementation for a 3 rd party DE It also illuminates specific data integration needs We may use it as a testing ground for new ideas in GUI, service coordination, and API design It will be ported to use the full implementation of the iPlant API and used as an example for potential developers Web application portion: 1 day Web services: 1 week
Genome Services 11 Why is this needed? This is G2P not genomics! Support multiple genomes in UHTS services Support germplasms and natural accessions Pave the way to supporting user genomes Make best use of existing resources Sane, authority-led approach to data integration
Current Ideas Return a structured list of taxonomic identifiers (Genus, species, version, germplasm/accession) supported by iPlant Given a genus, species, version, and germ plasm/accession identifier: Return a URI pointing to a multiple-FASTA containing the genome sequence Return a URI pointing to a GFF3 version of the genome annotation Return a URI pointing to a GTF version of the genome annotation Return a URI pointing to the dummy expr files needed by Cufflinks for RNAseq Be able to actually return the files referenced by these URIs for download Given the taxonomic identifier plus a name or synonym of a gene Return an authoritative name for said gene Given the taxonomic identifier plus a microarray platform name plus a probe identifier: Return the canonical gene name mapped to that microarray probe iPlant Genome Services API Clade- specific data authorities NCBI and EBI Local Knowledge Mirroring relationships
Genome Services iPlant Genome Services API Clade- specific data authorities NCBI and EBI Local Knowledge Mirroring relationships Direct relationships Indirect relationships (CoGE) Taxonomic Name Resolution Service (TNRS) Discovery Environment TAIR Gramene Phytozome Etc.
The iPlant API 14 The iPlant API will support the following use cases: 1.I have a command-line tool that performs a specific type of bioinformatics analysis and I want to make it available to others. 2.I have a web service that performs a specific type of bioinformatics analysis and I want to make it available to others. 3.I have a web site that people can use to perform analyses and I want to make it available to others. 4.I want to write an web application that chains multiple types of tools together. 5.I want to use a workflow manager like Taverna or Kepler to orchestrate a set of analytical steps.
Core Services Eventing I/O Data Transforms App Discovery Job Mgmt. User Profile Mgmt. Authentication User/Project Auditing Mashups (Orchestration)
I/O Services Getting raw data into and out of the iPlant CI and moving data around internally /io: upload files and stage URIs (http, https, ftp, sftp, gsiftp, jdbc, amazon s3, irods) /io/list: list iPlant files /io/ : download, delete file
Job Management Services Submitting and managing jobs to run supported applications as well as querying for historical information about jobs /job: submitting a job /job/history: historical job history /job/ : kill an active job or get information about a job /job/ /input/list: get a listing of the input files associated with a specific job /job/ /input/ : retrieve a specific input file in the format it was in when the job ran /job/ /output/list: get a listing of the output files associated with a specific job /job/ /output/ : retrieve a specific output file associated with the job
Application Discovery Services Application discovery and management (different from semantic web service discovery) /apps: add a new application to the iPlant CI /apps/list: list all supported applications /apps/search: search for a specific application /apps/type/list: list all supported application types /apps/type/ : list all supported applications of a specific type /apps/name/ : list all supported applications matching a given name