Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD.

Similar presentations


Presentation on theme: "1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD."— Presentation transcript:

1 1 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California gil@isi.edu http://www.isi.edu/~gil

2 2 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

3 3 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Benefits of Semantic Workflows [Gil JSP-09] Execution management: Automation of workflow execution Managing distributed computation Managing large data sets Security and access control Provenance recording Low-cost high fidelity reproducibility Semantics and reasoning: Workflow retrieval and discovery Automation of workflow generation Systematic exploration of design space Validation of workflows Automated generation of metadata Guarantees of data pedigree “Conceptual” reproducibility

4 4 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Semantic Workflows in Wings [Kim et al CCPEJ 08; Gil et al IEEE eScience 09; Gil et al K-CAP 09; Kim et al IUI 06; Gil et al IEEE IS 2010] Workflows augmented with semantic constraints Each workflow constituent has a variable associated with it –Nodes, links, workflow components, datasets –Workflow variables can represent collections of data as well as classes of software components Constraints are used to restrict variables, and include: –Metadata properties of datasets –Constraints across workflow variables Incorporate function of workflow components: how data is transformed Reasoning about semantic constraints in a workflow Algorithms for semantic enrichment of workflow templates Algorithms for matching queries against workflow catalogs Algorithms for generating workflows from high-level user requests Algorithms for generating metadata of new data products Algorithms for assisting users w/creation of valid workflow templates

5 5 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Semantic Workflows in WINGS Workflow templates Dataflow diagram Each constituent (node, link, component, dataset) has a corresponding variable Semantic properties Constraints on workflow variables (TestData dcdom:isDiscrete false) (TrainingData dcdom:isDiscrete false)

6 6 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Semantic Constraints as Metadata Properties Constraints on reusable template (shown below) Constraints on current user request (shown above) [modelerInput_not_equal_to_classifierInput: (:modelerInput wflow:hasDataBinding ?ds1) (:classifierInput wflow:hasDataBinding ?ds2) equal(?ds1, ?ds2) (?t rdf:type wflow:WorkflowTemplate) > (?t wflow:isInvalid "true"^^xsd:boolean)]

7 7 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

8 8 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Workflow Representations State-of-the-art: Some standards for semantic workflow representations have been proposed by the community OWL-S, WSMO, SWSL, SWSF, … None has widespread use None has been adopted by a standards body So far not clear application-driven requirements Recommendation: caBIG should support the development of semantic workflow representation standards Provide requirements based on expected functionality to be delivered in caBIG Participate in and lead on standardization efforts Recommendation: caBIG should develop (iteratively) applications that demonstrate the use of semantic workflow representations

9 9 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

10 10 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Provenance State-of-the-art: Some provenance representations have been proposed for provenance recording Semantic: Provenance vocabulary (RDF-based) Non-semantic: Open Provenance Model (XML-based) None has widespread use None has been adopted by a standards body Growing interest in provenance in the Semantic Web community: many requirements coming from eScience, eGovernment, Linked data, etc. New W3C Incubator Group on Provenance is developing use cases, requirements, state-of-the-art, and recommendations for follow-on standards efforts Recommendation: caBIG should support the development of provenance representation standards Provide requirements based on caBIG requirements Participate in and lead on standardization efforts Recommendation: caBIG should develop (iteratively) applications that demonstrate the use of provenance representations

11 11 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group (Y. Gil, chair): 1) Why Provenance Provenance: Sources of information + entities + processes involved in producing an artifact Some uses of provenance: Making trust judgments when information sources are diverse and of varying quality (the Web!) Providing justifications for conclusions Establishing attribution Enabling comparison and reproducibility of processes Provenance is ubiquitous: Business practice, cultural artifacts, science applications, etc Data integration, licensing and attribution, accountability Provenance is a pressing issue for the Semantic Web Linked Data eScience (HCLS) and eGovernment (data.gov.uk)

12 12 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group: 2) Charter and Goals of the Incubator Group Provide state-of-the-art understanding and develop a roadmap for development and possible standardization Articulate requirements for accessing and reasoning about provenance information Develop use cases Identify issues in provenance that are direct concern to the Semantic Web Articulate relationships with other aspects of Web architecture Report on state-of-the-art work on provenance Report on a roadmap for provenance in the Semantic Web Identify starting points for provenance representations Identifying elements of a provenance architecture that would benefit from standardization

13 13 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group: 3) Products of the Group to Date Group formed in September 2009 All information is public: http://www.w3.org/2005/Incubator/prov/wiki/ http://www.w3.org/2005/Incubator/prov/wiki/ Developed a set of key dimensions for provenance (11/09) Grouped into three major categories: content, management, use Developed use cases for provenance (12/09) More than 30 use cases, most were improved and curated Include ~10 eScience use cases, some motivated by caBIG Developed requirements for provenance that arise from the use cases (1/10) User requirements: what is the purpose/use of the provenance information Technical requirements: derived from the user requirements Currently developing state-of-the-art report (expected 6/10)

14 14 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group: 4) Use Cases Developed 1. Result Differences 2. Anonymous Information 3. Information Quality Assessment for Linked Data 4. Timeliness 5. Simple Trustworthiness Assessment 6. Ignoring Unreliable Data 7. Answering user queries that require semantically annotated provenance 8. Provenance in Biomedicine 9. Closure of Experimental Metadata 10. Locating Biospecimens With Sufficient Quality 11. Using process provenance for assessing the quality of Information products 12. Provenance Tracking in the Blogosphere 13. Provenance of a Tweet 14. Provenance and Private Data Use 15. Provenance of Decision Making in Emergency Response 16. Provenance of Collections vs Objects in Cultural Heritage 17. Provenance at different levels in Cultural Heritage 18. Identifying attribution and associations 19. Determining Compliance with a License 20. Documenting axiom formulation 21. Evidence for public policy 22. Evidence for engineering design 23. Fulfilling Contractual Obligations 24. Attribution for a versioned document 25. Provenance for Environmental Marine Data 26. Crosswalk Maintenance 27. Metadata Merging 28. Mapping Digital Rights 29. Computer Assisted Research 30. Handling Scientific Measurement Anomaly 31. Human-Executed Processes 32. Semantic disambiguation of data provider identity 33. Hidden Bug

15 15 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group (Y. Gil, chair): 5) Major Dimensions of Provenance Content Attribution - provenance as the sources or entities that were used to create a new result Responsibility - knowing who endorses a particular piece of information or result Origin - recorded vs reconstructed, verified vs non-verified, asserted vs inferred Process - provenance as the process that yielded an artifact Reproducibility (eg workflows, mashups, text extraction) Data Access (e.g. access time, accessed server, party responsible for accessed server) Evolution and versioning Republishing (e.g. retweeting, reblogging, republishing) Updates (eg a document with content from various sources and that changes over time) Justification for decisions – Includes argumentation, hypotheses, why-not questions Entailment - given the results to a particular query, what axioms or tuples led to those results Management Publication - Making provenance information available (expose, distribute) Access - Finding and querying provenance information Dissemination control – Track policies specified by creator for when/how an artifact can be used Access Control - incorporate access control policies to access provenance information Licensing - stating what rights the object creators and users have based on provenance Law enforcement (eg enforcing privacy policies on the use of personal information) Scale - how to operate with large amounts of provenance information

16 16 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group (Y. Gil, chair): 5) Major Dimensions of Provenance (Cont’d) Use Understanding - End user consumption of provenance. abstraction, multiple levels of description, summary presentation, visualization Interoperability - combining provenance produced by multiple different systems Comparison - finding what's in common in the provenance of two or more entities (eg two experimental results) Accountability - the ability to check the provenance of an object with respect to some expectation Verification - of a set of requirements Compliance - with a set of policies Trust - making trust judgments based on provenance Information quality - choosing among competing evidence from diverse sources (eg linked data use cases) Incorporating reputation and reliability ratings with attribution information Imperfections - reasoning about provenance information that is not complete or correct Incomplete provenance Uncertain/probabilistic provenance Erroneous provenance Fraudulent provenance Debugging

17 17 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

18 18 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Workflow Versioning Workflow is a first-class citizen of scientific research Has lifecycle: design, creation, evolution, Can be classified into meaningful categories and has relationships to other similar workflows (eg “more-modern-version-than”) Little work done to date on workflow evolution and versioning VizTrails, myExperiment look at workflows that evolve over time Crucial area to pursue if workflows are to be used daily in an organization Recommendation: Workflow versioning is a largely unchartered research area that caBIG could pioneer

19 19 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

20 20 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Adaptive Workflow The environment where workflows live are highly dynamic New datasets, new analytic tools, new execution resources New services in the architecture Workflow systems tend to have a fixed behavior Some work on dynamic workflow steering based on execution Semantic workflow systems can have adaptive behaviors Find/discover recent datasets relevant to a user’s analysis Incorporate new analytic tools into a workflow Assign execution resources on the fly Recommendation: caBIG should explore the use of semantic technologies that can make workflow approaches more adaptive Recommendation: caBIg should develop semantic workflow system architectures that allow modular incorporation of new services in the architecture New data catalogs, new component catalogs, etc

21 21 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 Background on Linked Data http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData “The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.” Adopted by many: data.gov.uk, NYT, etc Growing virally and very fast As of May’09, it included over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links

22 22 Yolanda Gil (gil@isi.edu)USC Information Sciences InstituteJanuary 10, 2010 caBIG and Semantic Web, Linked Data Much content in Linked Data is on topics directly relevant to caBIG Colored in pink circles in the diagram of previous slide Linked Data offers light methodology (5 design principles) and flexible architecture (open semantic web technologies) Other flexible technologies in semantic web area (eg SKOS) Recommendation: caBIG should investigate the use of Linked Data and other open semantic web technologies to support biomedical research within caBIG Will provide greater flexibility needed in basic research projects Recommendation: caBIG should pioneer the use of semantic workflow technologies with Linked Data Workflow catalogs implemented with Linked Data, ie, semantic workflows as Linked Data objects


Download ppt "1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD."

Similar presentations


Ads by Google