1 Improving the Reuse of Scientific Workflows and their By-products Xiaorong Xiang National Evolutionary Synthesis Center (NESCent) Duke University, University.

Slides:



Advertisements
Similar presentations
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Mother of Green Phylogenomics of the P
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Lecture Nine Database Planning, Design, and Administration
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
CSE 6406: Bioinformatics Algorithms. Course Outline
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Bioinformatics and Computational Biology
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
A Service-Oriented Data Integration and Analysis Environment for in- silico Experiments and Bioinformatics Research Xiaorong Xiang Gregory Madey Jeanne.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Lecture 21: Component-Based Software Engineering
Suggestions for Galaxy Workflow Design Using Semantically Annotated Services Alok Dhamanaskar, Michael E. Cotterell, Jessica C. Kissinger, and John Miller.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Of 24 lecture 11: ontology – mediation, merging & aligning.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Service-oriented architecture for integration of bioinformatic data and applications The topic of my dissertation is service-oriented architecture for.
Laura Bright David Maier Portland State University
Supporting High-Performance Data Processing on Flat-Files
Scientific Workflows Lecture 15
Presentation transcript:

1 Improving the Reuse of Scientific Workflows and their By-products Xiaorong Xiang National Evolutionary Synthesis Center (NESCent) Duke University, University of North Carolina - Chapel Hill, and North Carolina State University Gregory Madey Department of Computer Science and Engineering University of Notre Dame 2007 IEEE International Conference on Web Services (ICWS 2007) Salt Lake City, Utah, July 2007 Supported in part by the Indiana Center for Insect Genomics (ICIG) & the Indiana 21st Century Fund

2 Collaborators: Xiaorong Xiang & Jeanne Romero-Severson

3 Outline: two parts Production system (MoGServ) for bioinformatics workflow  Bioinformatics application  Productivity improvement Prototype system exploring ideas for end- user composition  Workflow reuse  Knowledge management/discovery

4 From the article “Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade” by Folker Meyer in journal CTWatch Quarterly August, 2006 volume 2 number 3 Bioinformatics today Rapidly accumulating data: DNA sequences, contigs, expression data, annotations, etc. Non-standard independently developed heterogeneous data sources Data sharing and security Productivity Problem!

5 SOA in Bioinformatics MORE Community efforts needed to provide more shared and reliable services More demonstration projects needed => best practices, measured utility, feedback to middleware projects, etc. Recent exposure of data & analysis tools as services Large public databases and bioinformatics tools Middleware projects Provide infrastructure to compose, manage, execute, connect the distributed services

6 Mother of Green (MoG) project Biological science  In collaboration with Prof. Jeanne Romero-Severson, Biological Sciences, University of Notre Dame.  Study the deep phylogeny of plastid Computer science  Provide an environment to support scientists’ investigations  A case study of using SOA for data and application integration  A prototype for future research in service-oriented architecture domain

7 Mother of Green Malaria causes million deaths every year Malaria causes million deaths every year 3,000 children under age five die of malaria every day 3,000 children under age five die of malaria every day Plasmodium falciparum (a protozoan parasite)causes human malariaPlasmodium falciparum (a protozoan parasite) causes human malaria Drug resistance a world-wide problem Drug resistance a world-wide problem Targeted drug design through phylogenomics Targeted drug design through phylogenomics Malaria causes million deaths every year Malaria causes million deaths every year 3,000 children under age five die of malaria every day 3,000 children under age five die of malaria every day Plasmodium falciparum (a protozoan parasite)causes human malariaPlasmodium falciparum (a protozoan parasite) causes human malaria Drug resistance a world-wide problem Drug resistance a world-wide problem Targeted drug design through phylogenomics Targeted drug design through phylogenomics P. falciparum

8 Mother of Green P. falciparum has three genomes P. falciparum has three genomes Nuclear, mitochondrial, plastid Animals and insects have only two Animals and insects have only two Target the third genome Target the third genome No harm to animals No harm to animals New antimalarial drug New antimalarial drug High risk, high tech, high payoff High risk, high tech, high payoff J. Romero-Severson Department of Biological Sciences Greg Madey & Xiaorong Xiang Department of Computer Science & Engineering J. Romero-Severson Department of Biological Sciences Greg Madey & Xiaorong Xiang Department of Computer Science & Engineering

9 Mother of Green Plastids are the third genome Intracellular organelles Terrestrial plants, algae, apicomplexans Functions in plants and algae Photosynthesis Oxidation of water Reduction of NADP Synthesis of ATP Fatty acid biosynthesis Aromatic amino acid biosynthesis Functions in apicomplexans ? Plastids are the third genome Intracellular organelles Terrestrial plants, algae, apicomplexans Functions in plants and algae Photosynthesis Oxidation of water Reduction of NADP Synthesis of ATP Fatty acid biosynthesis Aromatic amino acid biosynthesis Functions in apicomplexans ? Chloroplast in plant cell Plastid in Toxoplasma sp. Apicoplast in P. falciparum plastid

10 Mother of Green The apicoplast appears to code for <30 proteins. Repair, replication and transcription proteins Why is the apicoplast essential?

11 Find the ancestors of the apicoplast Identify genes in the ancestors Determine gene function Look for these genes in the P. falciparum nucleus Then study regulatory mechanisms in candidate genes Mother of Green Phylogenomics Mother of Green Phylogenomics

12 Phylogenomics of plastids Very old lineage (> 2.5 billion years) Cyanobacterial ancestor Three main plastid lineages Glaucophytes Group of freshwater algae Chloroplast resembles intact cyanobacteria Chlorophytes Green plant lineage Chloroplast genome reduced Many chloroplast genes now in nuclear genome Rhodophytes Red algal lineage Chloroplast genome bigger than in green plants Oomycetes Apicomplexans

13 Phylogenomics of plastids One cyanobacterial ancestor ? Many? Lineages are not linear One plastid origin Multiple plastid origins

14 The process of endosymbiosis. Horizontal Gene Transfer (arrows) from the plastid to the nucleus. The nucleomorph is a remnant of the original endosymbiont nucleus. Primitive eukaryote Endosymbiont plastid Secondary endosymbionts Second eukaryote Secondary nonphotosynthetic endosymbiont Cyanobacteria Nucleus Nucleomorph Plastid disappears

15 Secondary endosymbiont Tertiary endosymbionts Third eukaryote Tertiary nonphotosynthetic endosymbiont Plastid disappears Tertiary endosymbiosis. Horizontal Gene Transfer P. falciparum

16 The information gathering problem Rapid accumulation of raw sequence information ~100 sequenced chloroplast genomes ~57 sequenced cyanobacterial genomes Rate of accumulation is increasing Information accumulates faster than analyses finish Information in forms not readily accessible Solution Semi-automated web-services “Smart” web-services Semantic web

17 A typical in-silico investigation – Data driven research A: Query complete genome sequences given a taxa A: Query complete genome sequences given a taxa B: Query protein coding genes for each genome sequence B: Query protein coding genes for each genome sequence C: Eliminate vector sequences C: Eliminate vector sequences D: Sequences alignment D: Sequences alignment E: Phylogenetic analysis E: Phylogenetic analysis

18 Time consuming manual web-based operations Data collection  Copy & paste! Analysis tool usage  Copy & paste! Experiment data recording  Copy & paste! Repetitive experiments for scientific discovery  Copy & paste! Repeat as new data becomes available  Copy & paste!

19 MoGServ system architecture MoGServ interface  Web interface  Application interface MoGServ middle layer  Data access storage  Data and analysis services  Service and workflow registry  Indexing and querying metadata  Service and workflow enactment Acting in two roles: service requester and service provider

Web Interface Applications Application Server Data Access Services Data Access Services Data Analysis Services Data Analysis Services Job Manager Job Launcher Service/Workflow Registry Service/Workflow Registry Metadata Search Metadata Search Local Data Storage Local Data Storage Workflow/Soap Engines Services NCBI DDBJ EMBL Data/Services Providers MoGServ Middle Layer Services Access Client Others MoGServ System Architecture

21 Data storage and access services Local database  Integrating data from multiple data sources with scientists interests  Supporting repetitive investigations against several subsets of sequences  Avoiding network traffic and service failure when retrieving data on-the-fly from public data sources Accessing the data in the local database by services

22 Service and workflow registry A table-based description with necessary properties  Text description  Service location  Input/output  Provider  Version  Algorithm  Invocation method Not intended for supporting service discovery or composition To answer end-users questions about their results  Provenance: “Which algorithm was used to generate the data and what is the source of the input data?” A repository of service and workflow used for local application developers

23 Indexing and querying metadata Metadata  Service and workflow description  Description of sequence data in order to track the origination of data  Experimental data output, input, and intermediate data Indexing and querying with keyword  Lucene  Implemented as services

24 Service and workflow enactment INPUT Parameters Task Name Timer INPUT Parameters Task Name Timer Service/Workflow Registry Job Manager Find the service/workflow definition using the task name Form a Job Description Output Job ID Output Job ID Job Launcher Instances of Workflow/Service Engines Instances of Workflow/Service Engines Job Information

25 Implementation Development and deployment  J2EE, JSP, XSLT  Tomcat / Axis 1.2 Database  PostgresSQL 8.1 Index and search of metadata  Apache Lucene library Service implementation  Java2WSDL  Wrap command line applications with JLaunch library Workflow  Taverna workbench, part of myGrid project  Freefluo workflow engine

26 Data and services Services, Workflows Data collection from remote database Query local database Data analysis tools, blast, clustalw, Data format conversion, readseq Management data sets and jobs Download and upload Data Complete genome sequences ATP gene sequences Sequence sets Saved jobs

27 Taverna workbench

28 A workflow created using the Taverna workbench tool

29 Improvement opportunities Use existing domain ontology in bioformatics community to describe services, workflows, and data Integrate the semantic web technology to support end-users workflow creation based on their knowledge of scientific domain Support users with limited knowledge of scientific processes Record various workflow representations Facilitate the discovery and reuse of prior workflows  Knowledge management  Knowledge discovery

30 Service Composition and workflows Service composition  Ad-hoc  Semi-automate Semantic annotation + reasoning  Automated Semantic annotation + planning Scientific workflows  Workflows composed based on service-oriented architecture for assisting scientists in accessing and analyzing data.

31 Current workflow management systems Existing workflow management system and bioinformatics middleware  Taverna, Kepler, Triana, Pegasus  Design, execute, monitor, re-run Support ad-hoc, semi-automated and automated service discovery and composition from scratch

32 Our approach Reuse the verified knowledge and workflow in the community  Increase the correctness of composed workflows over time  Provide more accurate guidelines for users A four level hierarchical workflow structure An enhanced workflow system

33 Aligning Retrieving Workflow A defined by a less experienced user using the functional definition of services queryGene clustalW Workflow B defined by an intermediate user with executable services queryGene clustalW queryGene setIds setFilter clustalW Workflow C defined by an expert user with two extra executable services to ensure the accurate output of the biological process Three user-defined workflows from different views Question: “are gene genealogies for ATP subunits α, β,and γ different?”

34 User Service Annotator Abstract workflow OWL DL reasoner OWL DL reasoner Ontology Create abstract workflow using ontology Annotate services using ontology Semantics enabled service registry Semantics enabled service discovery Semantics enabled service discovery Service matchmaking Workflow composer (software agent/experienced users) Find appropriate service Workflow execution engine Workflow execution engine concrete workflow Data provenance management Data provenance management Collect and manage information about data origination Knowledge base management Knowledge base management Knowledge discovery Knowledge discovery Enhanced workflow system MogServ

35 Encode, convert the High level definition To low-level executable Invoke a workflow with Specific input data and Record the data Provenance and Performance of services, workflows. Abstract workflow Concrete workflow Optimal workflow Workflow instance Replace individual Services with their optimal alternatives Task A Task B Service B Service A Service D Service C Service B Service A Service D Service C’ input output Service B Service A Service D Service C’ Our hierarchical workflow structure Pegasus workflow structure

36 Reusable knowledge Connectivity  Helps to convert from abstract workflow to concrete workflow Alternative services  Helps to convert from concrete workflow to optimal workflow Quality profile of services  Helps discover optimal workflows Mapping of abstract workflow and concrete workflow  Helps to choose reusable workflows

37 Connectivity identification (Match detection) Service: QueryLocal Operation: createSet performTask: mygrid:retrieving inputPara: Settype(String, mog:gene) Queryterm(String, null) outputPara: Setid(string, mog:geneset) useResource: MoG Service: ClustalW Operation: runClustalWdf performTask: mygrid:aligning inputPara: Setid(String, mog:set ) Sequencetype(String, mog:sequence) outputPara: filen(string, mygrid:sequence _alignment_report) useResource: EBI Service: FormatConversion Operation: convert performtask: mygrid: translating inputPara: filen(String, mygrid:sequence _alignment_report ) outputPara: Out(String, mygrid:nexus _paup_format) useResource: MoG Parameter (data type, semantic type) Matching rule: opertation ij → operation mn if exist parameter k is output parameter of operation ij and exist parameter o is input parameter of operation mn and data type (parameter o ) = data type (parameter k ) and semantic type (parameter o ) = semantic type(parameter k )

38 Need for verified service connectivity The mismatching problem TPFP FNTN Match Detection output Accurate annotation Inaccurate annotation Lack semantic annotation Inaccurate reasoning Inaccurate annotation Lack of semantic annotation Inaccurate reasoning Accurate annotation GenBankService Out:GenBank record Blastp In: protein sequence X Mediator, adaptor, shim DDBJ-XML Out: sequence data record NCBI blast In: sequence data record fasta formatSelf-defined format May be detected by experts at design time or after run Can be detected automatically X YesNo Yes No FP TN Real match

39 Connectivity Graph Implementation Registration process registry Automatically Identify the connectivity Knowledge base Store the connectivity Workflow Translation / Service composition process Refine, update, decompose the workflow connect (service a, operation ai, parameter c, service b, operation bi, parameter d ) identifyConnect (Single service, rdf repository) Search at syntactic level: search path between two nodes search next available service automatic composition base on input, output Implementation: shortest path algorithm Dijkstra Connectivity between services is converted to finding a path between two nodes in a graph

40 Generic Service Description Ontology (myGrid/Feta model) Data Services Workflows Service Domain Ontology (myGrid) MoGServ application Domain Ontology (MoGServ) Software components for annotation RDF Store Ontological modules used for semantic description of data, services & workflows

41 MoGServ Application Domain Ontology  To better track the data origination  To support the automation of workflow creation  To better share the data on the web in the future propertiesdomainrange invokedbyJobUser isParentOfSet isInstanceOfJobService hasSetNameSetXML:String Ontological modules Number of ConceptsNumber of properties Object Datatype MoGServ1297 myGrid4198 myGrid/Feta model Example concepts and properties defined in MoGServ

42 Sample service/workflow annotation Question: Which service has an operation that accepts nucleotide_sequence as a parameter Answer: Uri: …/alignment:blastn_ncbi OperationName: Run Displayed by Rdf-Gravity

43 Implementation of annotation and query components for data, services & workflows Sesame library  Supports files, RDBMS, SeRQL Sesame RDF store Annotation Templates (Data) Annotation Templates (Service) Query templates Select Y, W, X from {Y} mg:hasOperation{W} mg:inputParameter {X} rdf:type {mog:set} using namespace rdf =, mg =, mog = Query Components Annotation components result Service: axis/services/ClustalW?wsdl Operation: runClustalWdf inputParameter: setid SeRQL

44 Experiment Used 418 concepts from domain ontology for semantic type, defined 10 concepts for data type. Randomly generate service annotation. 1 input, 1 output 1000 services connectivity graph (right side) Intel Pentium mobile 1.5GZ Number of servicesNumber of Matched pair Load RDF repository (milliseconds) Average time of match detection per single service (milliseconds) Number of nodes724 Number of arcs587 Average path search time (milliseconds) Less than 1 Connectivity graph load time (milliseconds) 220 Length 0 = 724, length 1= 587, length 2=448, length 3= 281, Length 4=114, length 5=71 Length 6 =28, length 7=16 Length 8 = 4, length 9 = 2 Conclusion: Feasible solution.

45 Reuse of workflows Reuse of abstract workflows Reuse of concrete workflows Compare structural similarity of two workflows Implementation: SUBDUE algorithm SUBDUE is has a graphy match utility that is part of its data mining system Given workflow is converted to a graph and fed to the SUBDUE match algorithm Abstract example … input output query_term hasParameter task hasInput task hasNext retrieving aligning multiple_alignment_report performTask hasOutput performTask hasParameter v 1 input v 2 output v 3 task v 4 task v 5 query_term v 6 retrieving v 7 aligning v 8 multiple_aligning_report e 3 4 hasNext e 3 1 hasInput e 4 2 hasOutput e 3 6 performTask e 4 7 performTask e 1 5 hasParameter e 2 8 hasParameter SUBDUE input format Graph view

46 Conclusion Pro  Increase the correctness of the formed workflow over time Avoid the incorrect, inaccurate semantic annotations Take advantage of verified knowledge Avoid the ontological reasoning process  Better support for semi-automated and automated service composition over time Provide more accurate guideline to users over time Con  The connectivity graph can be big Number of parameters Number of services  Search the connectivity of a service when a service is registered in the system may take relative long time More complex matching rule Number of parameters  May not have high accuracy at the beginning

47 Future work Integrate the GridSam into the MoGServ for execution, monitoring Integrate the Grid computing technology for resource allocation Refine the MoGServ application domain ontology Create interface for end-user workflow creation Create interface for individual workspace Evaluate the scalability, accuracy of connectivity graph approach and the graph matching approach with large number real workflows and services

48 Thank you Questions?