Presentation is loading. Please wait.

Presentation is loading. Please wait.

Photo taken by Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush!

Similar presentations


Presentation on theme: "Photo taken by Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush!"— Presentation transcript:

1 Photo taken by http://flickr.com/people/mfsarwar/ Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush!

2 A brief history of BioMoby Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) May 21, 2002 – Genome Canada Platform Award May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML July 18, 2002 – First Moby Client (Gbrowse Moby) June 9, 2003 – API Version 0.5 deployed 2006 – Genome Canada Platform Award 2007 - Version 1.0 API submitted for publication

3 MOBY-DIC Chapter VII 7 th Model Organism Bring Your-own Database Interface Conference Vancouver, BC, June 2007.

4 The Core Ahab’s

5 Wendy Richard Mylah Martin Eddie

6 Andreas Paul Ivan Mark’s Screen…

7 Create an ontology of bioinformatics data-types Define a serialization of this ontology (data syntax) Create an open API over this ontology Define Web Service inputs and outputs v.v. Ontology Register Services in an ontology-aware Registry Machines can find an appropriate service Machines can execute that service unattended Ontology is community-extensible The BioMoby Plan

8 Gene names MOBY Central MOBY hosts & services Sequence Alignment Sequence Express. Protein Alleles … Align Phylogeny Primers Overview of BioMoby Transactions

9 MOBY Central Sequence Align Phylogeny Primers Overview of BioMoby Transactions Object ontology What is a sequence? A sequence is a ___ That has these features __ Discovery of services That consume things LIKE sequences!

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26 This is SCUFL – Simple Conceptual Unified Flow Language It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…

27

28 Pipeline discovery “on the fly” No explicit coordination between providers Dynamic discovery of ~appropriate Services Automated execution of services

29 Some BioMoby statistics

30 Moby: Breadth Namespaces (data types): 418 Objects (data syntaxes): >561 Service Types (analytical categories): 112 Providers: ~50 active Service Instances: ~1200 currently “alive” –In main Moby Central server in Canada –Others in “boutique” Moby registries serving specialized communities worldwide

31 Moby: Clients Gbrowse_moby (M Wilkinson) PlaNet Locus_View (H Schoof, R Ernst) Blue-Jay (P Gordon) Taverna (T Oinn, M Senger, E Kawas) MOWserv (INB, Spain) Remora (S Carrere, J Gouzy, INRA) MOBYLE (B Néron, P Tufféry, C Letondal, Pasteur Inst.) SeaHawk (P Gordon)

32 BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

33 BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

34 Moby Namespaces A “Namespace” is a category of identifiers –NCBI has gi numbers (gi Namespace) –GO Terms have accession numbers (GO Namespace) Namespaces indicate data’s semantic type. –GO:0003476  a Gene Ontology Term –gi|163483  a GenBank record Though we are using the word “Namespace” correctly, it causes confusion! –“Namespace” in XML is tightly associated with an XML document and/or its syntax –In Moby, we are ONLY talking about data entities NOT THEIR SYNTAX

35 BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

36 BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

37 The MOBY Object Ontology Syntactic types are defined by a GO-like ontology –Class name at each node –Edges define the relationships between Classes –GO used as a model because of its familiarity in the community Edges define one of three relationships –ISA Inheritance relationship All properties of the parent are present in the child –HASA Container relationship of ‘exactly 1’ –HAS Container relationship with ‘1 or more’

38 The Simplest Moby Data- Type Object The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation

39 Moby Primitives Object Integer String Float DateTime ISA 38

40 A Derived Data-Type Object Integer Virtual Sequence String ISA HASA 38 38 Describes the semantic relationship between the Integer and the Virtual Sequence

41 38 38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC Object Integer Virtual Sequence String ISA HASA Generic Sequence ISA HASA A Derived Data-Type

42 38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC 38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC Object Integer Virtual Sequence String ISA HASA Generic Sequence ISA HASA DNA Sequence ISA A Derived Data-Type

43 Legacy file formats TBLASTN 2.0.4 [Feb-24-1998] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|1401126 (504 letters) Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters Searchingdone Score E Sequences producing significant alignments: (bits) Value gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0 emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07 emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05 Containing “String” allows ontological classes to represent legacy data types

44 Binaries – pictures, movies MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx HTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVl bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf MB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt Text-base64 is a Class that contains String Binaries are base64 encoded and passed in classes that inherit from text- base64 base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String

45 With legacy data-types defined, we can extend them as we see fit annotated_jpeg ISA base64_encoded_jpeg annotated_jpeg HASA 2D_Coordinate_set annotated_jpeg HASA Description 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV Extending legacy datatypes

46 The same object… 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

47 The same object… 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3 Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

48 Cross reference types Simple –A MOBY Object Rich –Takes the form: –…Incidentally, this avoids the problem of reification that is experienced in RDF... Textual Description...... Textual Description...

49 XML Schema? The Object Ontology allows new data-types WITHOUT new flatfile formats, and without having to understand e.g. XML Schema Minimize future heterogeneity Improve interoperability without requiring schema- to-schema mapping

50 Object Ontology terms have semantically rich names, but this is primarily for human intuition –DNA Sequence –Annotated_GIF Object Ontology does not define the meaning of an object to the machine –No machine-readable semantics It does define the representation –SYNTAX XML Schema?

51 A portion of the MOBY-S Object Ontology …community-built!

52 BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

53 A Moby Central Query Give me: –Services that consume THIS data-type in THIS syntax… –…do SOMETHING LIKE THIS to it… –…and provide me THAT data-type in response

54 Example Find me services that –consume FASTA sequence data, –do a BLAST with it, –and provide me lists of GenBank GI numbers in return. Query can be any or all of the above criterion –Also limit by service provider and service description keyword

55 Remember!! Moby Registry Query INPUT TYPE | TRANSFORMATION TYPE | OUTPUT TYPE

56 A weakness of MOBY Service discovery is horribly flawed due to insufficiently rich semantics…

57 Chickens go in; Pies come out! The problem with Moby

58 What sort o’ pies?

59 Apple! The problem with Moby

60 The MOBY-S Service Ontology A simple ISA hierarchy… –too simple! Primitive types include: –Analysis –Parsing –Registration –Retrieval –Resolution –Conversion –Rendering

61 Parse_WU_Blast A slice of the Service Ontology Service Blast NCBI_Blast WU_Blast Parse_NCBI_Blast Parsing Alignment Analysis “The Exploding Bicycle” - A. Rector, U Manchester

62 Summary so far BioMoby uses ontologies to describe both data types and data syntaxes –This is where the interoperability comes from –These are used to match consumers with providers during service discovery BioMoby uses a simple ontology to describe bioinformatics operations –This ontology is only marginally useful

63 Seahawk Highlight data in your browser and drag/drop it into Moby What could be easier than that?! Paul MK Gordon and Christoph W Sensen BMC Bioinformatics 2007, 8:208

64 BMC Bioinformatics, in press Seahawk: A New Moby Client for Biologists Drag ‘n’ drop, highlight existing data for use with MOBY Services Paul Gordon & Christoph Sensen

65 Seahawk looks like a browser

66 How do I load data?

67

68 Use the “open” button: –Text file (e.g. FASTA sequences) –HTML page (e.g. NCBI Entrez Web page) –RTF document (e.g. conference abstract) –MOBY XML document Drag ‘n’ Drop –Web links and desktop files –Highlighted text from open documents or Web pages

69 Under the Hood (Beneath the Bonnet?) Data has to be converted into Moby XML format to be used by Moby Moby data has to be converted back to human-readable text for presentation to the biologist

70 MOB Rules

71 DEM Rules...

72 Again: How do I load data?

73 How do I Find Services? Right-click  MOB rules are invoked Resulting Moby XML is used for service search

74 How do I run a service? Click it! If necessary, a service’s extra parameters can be set Control+click submits using default params

75 How do I run a service? If required inputs are missing, the missing ones must be dragged into place. Unrecognized data will be rejected

76 How do I collate data? Seahawk clipboard lets you build collections of objects Seahawk “knows” the type of collection and will suggest appropriate Moby services

77 Seahawk Summary Seahawk integrates Moby Web Service discovery and execution into the biologists day-to-day “Web Surfing” activity It uses Regular Expressions and XSLT to move normal web or hard-drive-file data into and out of BioMoby

78 …hey… wait a minute…!! If SeaHawk can automatically convert “raw” data into Moby data and back again… …and if the majority of tools in the world use “raw” data… Q: Why can’t we automatically Mobyfy the existing tools that we all know and love? A: Because there is no way for Seahawk to know the INTENT of each field in a Web FORM! That is only knowable to a human... …but what if it could figure it out…!!

79 Watch and learn! Spying on biologist’s behaviour to automate Mobyfication of existing tools (coming soon to SeaHawk!!...)

80 Watch and Learn The biologist opens up their favorite tool inside of SeaHawk… and the SeaHawk spies on them as they use it Seahawk Proxied Web page Drag ‘n’ drop Seahawk AJAX prompting

81 The Process Spy on the biologist Interpret their use of the Web tool Auto-generate a series of MOB/DEM rules Auto-generate a Moby Service that utilizes these MOB/DEM rules to interact with that Web tool Auto-register that “proxy” Moby Service in Moby Central That legacy Web tool now becomes available to everyone through BioMoby

82 Why doesn’t Moby Use RDF/OWL?

83 Timeline of Moby/W3C Activities 2000200120022003200420062005 RDF Candidate Spec RDF Schema Candidate Spec W3C Launches Semantic Web (SW) Activity Group BioMoby Project Established BioMoby XML Finalized BioMoby Stable 0.85 API Published (>400 services) RDF/OWL Formal W3C Recommendations BioMoby Stable 1.0 API Published >>>>>> Extensive SW toolbuilding…

84 Moby 2.0 Getting it right, the second time!

85 What BioMoby Already Does Sequence Data BLAST SERVER Blast Hit

86 What BioMoby Already Does Sequence Data Blast Hit givesBlastResult Not “Bologically” Meaningful

87 What BioMoby Already Does Sequence Data Blast Hit hasHomologyTo URI hasHomologyTo URI …looks a lot like… Which is effectively just an RDF triple,

88 Now think in reverse…

89 (in case you forgot…) Moby Registry Query INPUT TYPE | TRANSFORMATION TYPE | OUTPUT TYPE

90 Moby 2.0 Sequence Data What does Have homology to? hasHomologyTo Maps to BLAST SERVICE Send data Blast Hit

91 Query FIND SERVICES THAT Consume Sequence Data | Provide hasHomologyTo Property | Attached to other Sequence Data

92 SPARQL A Semantic Web query language Queries “look like” graphs Find “X” with predicate “Y” attached to “Z”

93 Moby 2.0 extends the SPARQL query language SPARQL queries contain concepts and the relationships between them (subject, predicate, object) We simply map RDF predicates onto Moby services capable of generating that relationship Registry query: “What Moby service consumes [subject] and generates the [predicate] relationship type?”

94 But wait, there’s more!

95 Exploit knowledge in OWL ontologies to enhance query Subject Predicate Look up and execute Moby service Consumes proteins and generates Functional annotation info Subject Predicate Look up and execute Moby service Consumes STK or proteins and Looks-up inhibitor molecules Evaluate Query Expression

96 Exploit knowledge in OWL ontologies to enhance query This SPARQL query could be posed on a database of RAW, UNANNOTATED Protein sequences, and be answered by Moby 2.0 (a.k.a. CardioSHARE)

97 Credits Genome Canada/Genome Alberta myGrid – Carole Goble in particular Spanish National Institute for Bioinformatics (INB) through Fundación Genoma España Generation Challenge Programme (GCP) of the Consultative Group for International Agricultural Research (CGIAR) Heart and Stroke Foundation of BC and Yukon (CardioSHARE) Microsoft Research (CardioSHARE)


Download ppt "Photo taken by Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush!"

Similar presentations


Ads by Google