Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre

Similar presentations


Presentation on theme: "Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre"— Presentation transcript:

1 Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre j.hogan@qut.edu.au

2 Smart BioTools Bioinformatics  Tools, Data, and linking them together  Exploration vs. Routine Workflow Mashups and BioMashups  Some basics and some canonical examples  Biomashups and their limitations Predictin’ the future 2

3 Smart BioTools Abundance of tools and data sources  Traditional standalone applications  Interactive web sites  (More recently) web service hooks Usually purpose-specific tools  Link together to solve complex problems

4 Smart BioTools The workflow trade-off:  Sophistication vs development effort  Keep it simple, and keep the scientist involved  Make it complex & make the scientist a client Bench scientists usually aren’t software engineers  But they can chain operations together if they have the right primitives and the right glue

5 Smart BioTools The manual data management system  Also known as cut-and-paste from Excel  Cannot scale, but it presents no barriers… Robust Workflow Systems: Taverna, Kepler et. al.  Essential for high-end instrumentation; well- engineered, support for provenance  But significant set-up, familiarisation…

6 Smart BioTools Scripting in perl, python et al.  Significant programming skills needed  Useful for well-defined processes, but exploratory work is time consuming  Accessing remote data and linking web services beyond most scientists  [A niche for biomashups?]

7 Smart BioTools Mashups are web-based applications for the combination of data sources and services Earliest mashups used Javascript to link exposed service and data APIs, and to wrap existing tools  Same issues as perl scripting, with the additional need to organise hosting  Little incentive to standardise or share

8 Smart BioTools Development environments, hosting and publication  Common interface structure  Building a community? Scripting for scientists?  Overcoming the programming barrier  Depends on the libraries, primitive ops  And there is (usually) javascript under the hood

9 Smart BioTools

10 Mashups are limited by data exchange Good at passing an index to the data  Think latitude & longitude Bad at passing massive data sets around Client mashup architecture e.g. Virtual Earth e.g Facebook... Client web browser Mashup Server Third Party Services Mashup

11 Smart BioTools Middle ground between cut-and-paste and full workflow management systems  Corresponds best to perl scripting  Ideal when user intervention is needed  May be seen as a prototype for Workflow  Helps to mask complex data access and search tools which frustrate experts and drive students to exasperation…

12 Smart BioTools Perform a blastx on the sequence. Obtain the best hit/hits by inspection of the blast output page. Retrieve Genbank record of the best hit by clicking on the link in the output page. Determine the known regions by inspection, in this case an ANF_receptor. Perform an Entrez search on this region.

13 Smart BioTools Perform a blastx on the sequence. (NCBI Blast block) Obtain the best hit/hits by inspection of the blast output page. (NCBI Blast result parser block) Retrieve Genbank record of the best hit by clicking on the link in the output page. (RDF Block, pointing to Bio2Rdf) Determine the known regions by inspection, in this case an ANF_receptor. (The mashup parses the RDF document instead - Bio2Rdf Block) Perform an Entrez search on this region. (NCBI Entrez block)

14 Smart BioTools Protein Characteristics  Name, sequence  Journal articles, cross-reference Protein Prediction  Molecular weight, isoelectric point  Secondary structure, post-translational mods

15 Smart BioTools Data & Services

16 Smart BioTools Mashups Architecture 13 Custom Blocks 1) Input and Output 2) Processing: protein characteristics 3) Processing: protein prediction Input Protein Characteristics Protein Prediction CombineOutput

17 Smart BioTools Given its Uniprot ID, how much can we find out about a particular protein?

18 Smart BioTools Given its sequence, what properties can we readily obtain from web-based prediction services?

19 Smart BioTools Frameworks can and will support  Ad hoc exploratory bioinformatics  Index-based routine computation  Building (enclave) communities Varying levels of success in allowing  Scientist (& student) driven mashups  Sharing and re-use of components

20 Smart BioTools It will be a long time before mashup frameworks:  Are used to process data from high- throughput sequencing machines  Process large scale collections  Beat Taverna & Kepler at provenance

21 Smart BioTools

22 Building a general BioMashups community  Cross-over between frameworks  Seeding the community with ‘re-usable’ components and reaching critical mass  The myExperiment BioMashups group Bringing BioMashups to the curriculum  The new undergraduate biology

23 Smart BioTools MQUTeR Bio & BioMashups  http://www.mquter.qut.edu.au/bio/ http://www.mquter.qut.edu.au/bio/  http://www.mquter.qut.edu.au/bio/biomashups.aspx http://www.mquter.qut.edu.au/bio/biomashups.aspx myExperiment BioMashups Group  http://www.myexperiment.org/groups/99 http://www.myexperiment.org/groups/99 Protein Mashups  http://www.mquter.qut.edu.au/bio/ProteinMashupsb[1].wmv http://www.mquter.qut.edu.au/bio/ProteinMashupsb[1].wmv  http://www.popfly.com/users/fsn/Protein%20Biomashups%20Summary%20p age http://www.popfly.com/users/fsn/Protein%20Biomashups%20Summary%20p age

24 Smart BioTools

25


Download ppt "Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre"

Similar presentations


Ads by Google