Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.

Similar presentations


Presentation on theme: "(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática."— Presentation transcript:

1 (Bio)Web Services at the INB BioMOBY

2 Instituto Nacional de Bioinformática

3 INB Mission “To generate and apply bioinformatics solutions to needs detected in development and implementation of genomics and proteomics focused projects” To support Bioinformatics and Computational Biology development in Spain To collaborate and provide scientific and technical support to national genomics and proteomics projects To contribute to the creation and establishment of local Bioinformatics groups with research and services components through bioinformaticians training To train bioinformaticians for genomics and proteomics research groups To develop pure Bioinformatics projects related with the Institute activities To support companies with activity in this sector in Spain To internationalize all its activities

4 INB Structure. A “virtual” institute

5 Web Services

6 Making some sense of this Fuente: myGrid 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

7 Or this…

8 Current practices Description Discovery Remote Programatic Acces Service Consumers Service Providers Service (Application, DB) Bioinformatics Integration: State of the Art A Web Page is the de facto standard Discovery: Word of mouth Web directories Google Paper publications Description: Word of mouth Web documentation / examples / tutorials / courses Paper publications Data transfer & Message Format: Cut & paste! + Data reformatting Automation: CGI & Bespoke code (ad hoc) APIs (normally big Bioinformatics Projects/institutes) Do you have data? Do you have tools? Publish a web page

9 What is wrong with Web apps. User side How do I find out where services are provided? Once I discover a service, how do I use it? Input/output data types. How do I take the output of one service and send it to another service? How do I use the service from within a program instead of through a form on a web page? Developer side Most of developer time is spent in presentation and input/output management. Error handling is mandatory Different projects use different formats, rules of use, etc.

10 Web services can solve the problem Central repository Well known input/output formats No need of user interfaces

11 A Web service is an interface that describes a collection of operations that are network accessible through standardized XML messaging* *Web Services Conceptual Architecture, Heather Kreger, IBM Software Group, 2001 PublishFind Bind Service Requestor Service Registry Service Provider Service Descriptions Service Description Service WDSL, UDDI WSDL, UDDI Web services model

12 However… Don’t help the situation much since… A bioinformatics that consumes a “string” might be expecting a FASTA sequence, or a keyword…?? Bioinformatics has many different ‘strings’! In Bioinformatics Web Service registries merely catalogue the chaos! Semantics rather than structure is necessary

13 Our audience Information is distributed MOST data never makes it off of the scientists hard drive This data should be added to the global scientific archive Biologists, by and large, are willing and able, but… The Web was embraced enthusiastically by biologists In fact, most wet labs run a website! Unfortunately, this only adds to the chaos… The interoperability solution must be simple enough for a Biologist, with a little bit of computer knowledge, to implement on their own

14 BioMOBY From MOBY-DIC (Model Organisms, Bring Your own Database Interface Conference) http://www.biomoby.org

15 BioMOBY – Scope and Definition http://www.biomoby.org OBJETIVES Study how to address interoperability problems that are actually being faced by bioinformatics users of web-accesible resources today, and what are the factors that promote the adoption of new approaches How to balance between increasing potential for interoperability and the likelihood of widespread adoption? I.e. focus upon minimizing the barriers to entry into the system, or insist upon a set of constraints that will guarantee usefulness of components of the system MOBY is a project to develop a web services architecture for bioinformatics 1.Common Syntax 2.Common Semantic 3.Dynamic Discovery BioMOBY is an international research project involving biological data hosts, biological data service providers, and coders whose aim is to explore various methodologies for biological data representation, distribution, and discovery.

16 The MOBY plan Define data-types commonly used in bioinformatics Organize these into an Ontology Ontologically define web service inputs and outputs Register the inputs and outputs in a “yellow pages” Machines can find an appropriate service Machines can execute that service unattended But users still can understand data types

17 Define: Semantics For a piece of data, its “semantics” are its intention its meaning its raison d’etre its context its relationship to other data

18 MOBY Semantic Typing: Namespaces Any identifiable piece of data is an “entity” Identifiers fall into particular “Namespaces” NCBI has gi numbers (gi Namespace) GO Terms have accession numbers (GO Namespace) Namespaces indicate data’s semantic type. GO:0003476  a Gene Ontology Term gi|163483  a GenBank record However, we cannot tell if it is protein, RNA, or DNA sequence Namespace + ID precisely specifies a data “entity” The Namespace is assumed to be sufficiently descriptive of the data’s semantic type that a service provider can define their interface in terms of Namespaces

19 Define: Syntax For a piece of data, its “syntax” are its representation its form its structure its language (of representation)

20 MOBY Syntactic Typing: The Object Ontology Syntactic types are defined by a GO-like ontology Type (“Class”) name at each node Edges define the relationships between Classes GO used as a model because of its comprehension & familiarity Edges define one of three relationships ISA Inheritance relationship All properties of the parent are present in the child HASA Container relationship of ‘exactly 1’ HAS Container relationship with ‘1 or more’

21 Define: Ontology A systematic representation of the entities that exist in a domain of discourse, and the relationships between them. Child Father Female Male Mother hasParent hasGender partnerOf

22 A portion of the MOBY-S Object Ontology …community-built!

23 What’s an “Object”? The smallest unit of information that can be passed by MOBY Consists simply of Namespace ID Thus an Object is nothing more than a “reference” to a data entity Ex. refers to the 3D structure of a Herpes Virus I Thymidine kinase, whereas refers to its sequence

24 The Object Ontology: A small slice ISA relationships do not necessarily add complexity to objects, some times they are just semantics Inheritance makes easier service discovery

25 MOBY objects A MOBY triple includes a namespace, an ID, and a class A simple MOBY object is just a pointer to data to be retrieved from somewhere An object may contain data in addition to the namespace and ID: object's data The object's data may include XML markup A complete MOBY object: 975 ATGATGCGGCTAGTGATGCTGTCGGCGGCATGATTAGG... articleName’s add human readable semantics to subclasses

26 ISA relationship - inheritance Classes become more specialized as you move along the ISA relationship hierarchy DNA_Sequence ISA Nucleotide_Sequence ISA Generic_Sequence ISA Virtual_Sequence ISA Object Classes do not become more complex as a result of ISA relationships alone

27 HASA & HAS relationships HASA and HAS relationships make Classes more complex by embedding Classes within Classes Virtual_Sequence ISA Object Virtual_Sequence HASA Length (Integer) Generic_Sequence ISA Virtual_Sequence Generic_Sequence HASA Sequence (String) Annotated_GIF ISA Image (base_64_GIF) Annotated_GIF HAS Description (String)

28 Legacy file formats Classic bioinformatics “strings” are just embedded into XML Binaries are base64 encoded. TBLASTN 2.0.4 [Feb-24-1998] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|1401126 (504 letters) Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters Searchingdone Score E Sequences producing significant alignments: (bits) Value gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0 emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07 emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05 gb|U12856|ATU12856 Arabidopsis thaliana Col-0 abscisic acid inse... 53 1e-05

29 Extending legacy data types With legacy data-types defined, we can extend them as we see fit annotated_jpeg ISA base64_encoded_jpeg annotated_jpeg HASA 2D_Coordinate_set annotated_jpeg HASA Description MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C

30 The position of an ontology node precisely defines the syntax by which that node will be represented End-users can define new data-types without having to write XML Schema! This was an important aim of the project A machine can “understand” the structure of any incoming message by querying its ontological type! The Object Ontology: Defines an XML Schema!

31 The Service Ontology A simple ISA hierarchy Primitive types include: Analysis Parsing Registration Retrieval Resolution Conversion

32 Goals achieved Common data-type ontology assures fully interoperability of services Present ontology, built freely, has low redundancy and covers most of bioinformatics entities. Currently, offered services are very specific and small modules, easy to interconnect to build complex workflows. This has been a natural behaviour rather than imposed by the standard!

33 Web services and workflows Common XML based input/output formats allow to chain several services to built a logical workflow Workflows are stored (in XML of course) and can be run several times Workflows can include web services from several providers

34 Output Service Input/output AAS: AminoAcidSeq Uniprot ID PDB ID getAASfromUniprotgetAASfromPDBId parseAASfromPDBText getPDBFilefromPDBId AAS PDBText BLASTText PMUTText StringtoAAS runPSIBlastfromAAS runPMUTHSfromBlastText String parseFeatureSeqfromPMUTText parsePropfromPMUTText plotFeatureAAS showPMUTonStruc Typed Image PDB Enriched FeatureAAS PropertySeq runFSOLVFromPDBText showFSOLVonStruc parseFeatureSeqfromFSOLVText parsePropfromFSOLVText FSOLVText

35 35 Gene detection by homology Input: Protein Id and DNA genomic sequence Building of Blast Database from DNA seq. BLAST Search Run GeneWise to detect gene structure

36 Web Services at INB-BSC BioMoby Web services offer (146) Database retrieval (34) Sequence comparison and alignment (46) Phylogeny (10) Sequence analysis (26) Structure analysis (9) Data handling and conversion (21) Applications covered Blast (24), Fasta (6), Clustal (2), Tcoffee (3), Hmmer (4), Phylip (10), Dali (1), Procheck (1), EMBOSS (25) http://inb.bsc.es/webservices.php BioMOBY Central: 254 Object Types, 430 Services

37 Implementation

38 XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to describe data XML tags are not predefined. You must define your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML can be parsed easily in most programming languages (Perl XML::LibXML module)

39 XML example (from Amazon)

40 XML Bio example. FASTA sequence SRC_HUMAN (P12931) Proto-oncogene tyrosine-protein kinase Src (EC 2.7.1.112) (p60-Src) (c-Src) (pp60c-src GSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAA FAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQ IVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENP RGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLV AYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTW NGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGS LLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVAD FGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPY PGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTS TEPQYQPGENL

41 XML Prosite entry TYR_PHOSPHO_SITE PATTERN PS00007 APR-1990 Tyrosine kinase phosphorylation site [RK]-x(2,3)-[DE]-x(2,3)-Y /TAXO-RANGE=??E?V; /SITE=5,phosphorylation; /SKIP- FLAG=TRUE; /VERSION=1; PDOC00007

42 XML / SOAP / WSDL XML is the basic language to transmit data between (bio)web services. Additional data between communication and data layers are necessary Data layer HTTP layer TCP/IP layer Data layer “Bio” layer TCP/IP layer SOAP layer HTTP layer XML layers Classical Web transaction (Bio)Web service transaction

43 XML / SOAP / WSDL SOAP (Simple Object Access Protocol): simple XML-based protocol to let applications exchange information over HTTP. Can use HTTP (mainly) or SMTP as underlying communication protocol, so it is platform and language independent WSDL: (Web Services Description Language) is an XML-based language for describing Web services and how to access them. Allow to recover all the necessary information to call a Web service through automatic SOAP requests

44 Structure of BioMOBY transaction http://schemas.xmlsoap.org/soap/envelope/

45 BioMOBY answer http://www.w3.org/1999/XMLSchema-instance <![CDATA[ Id: "ASA" SWRIISSIEQ KEESRGNEDH VKCIQEYRSK IESELSNICD GILKLLDSCL IPSASAGDSK.... ]]> ** Librería SOAP usada: libsoap 1.0.1

46 The three components of MOBY MOBY-Central Knows about all existing MOBY services Ask it for services by type of input, output or keyword Returns info on how to connect to a service Service Accepts MOBY requests Runs a program on service provider's computer Client Locates a service through MOBY-Central Connects to service, sends input data Waits for result Finds result buried within XML markup

47 MOBY transactions Registration Phase Query Phase Transaction Phase MOBY Service MOBY Service MOBY Central MOBY Central MOBY Client MOBY Client Register Service OK DATA Data Object Type Available Services Service Types Selected Service Service Def Request WSDL Input Data Object Output Data Object DATA

48 Moby Service Application Building MOBY object Building SOAP packet Connection to external users SOAP server MOBY object extraction SOAP packet MOBY Object Input data Extraction of biological data Output data BioMOBY APIService provider

49 How to use MOBY services: clients Programatic Access – MOBY API (perl, java, python…) Web Access – GBrowse browser, INB Clients: Bluejay, Eclipse/Haystack, Talisman/Taverna (myGrid) Expert Bioinformaticians Developers Biologists Genomic Projects Bioinformaticians Expert Biologists Genomic Projects

50 Programmatic access Native MOBY APIs in Perl, Java or Python Developed at INB-BSC MOBYLite API Runs on top of Perl MOBY API MOBY datatypes are translated into perl classes and services into perl functions API is built automatically from MOBY catalogue CommLineMOBY. Perl API to run in-house services without need of the SOAP layer

51 use INB_Ontology; # Package containing data types use inb_bsc_es; # Package containing services from inb.bsc.es # my $id = $ARGV[0]; my $uniprotId = Object->new($id,‘Uniprot’); # Obtaining sequence from Uniprot, an AminoAcidSequence object is created. my $AASeq = inb_bsc_es::getAminoAcidSequence ( input => $uniprotId )->{sequence}; # Running Blastp with error handling. A Blast_text object is created. my $BlastRep; eval { $BlastRep = inb_bsc_es::runNCBIBlastp( sequence => $AASeq )->{blast_report}}; unless ($@) { print $BlastRep->content; # a standard Blast report if no error } else { print “Error: Blast execution failed: $@”; } MobyLite API Uniprot ID getAASequence runBLAST BLAST Report

52 MOBY Clients Gbrowse_moby (M Wilkinson) Browser-style client Ahab & Ishmael (B Good, M Wilkinson) “BLAST” & Semantic Web style clients PlaNet Locus_View (H Schoof, R Ernst) Aggregator-style client Blue-Jay (P Gordon) and Rat Genome Database prototype (S Twigger) Menu-style clients MOBY Graphs (M Senger) Auto-workflow discovery tool Taverna (T Oinn, M Senger, E Kawas), and MOWserv (INB, Spain) Workflow builder/publisher/execution client Enhanced support for MOBY currently being built Eclipse plugins… etc…

53

54 MOWServ. INB Client


Download ppt "(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática."

Similar presentations


Ads by Google