1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Open Provenance Model Tutorial Session 4: Use Cases.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 1 Slide 1 An Introduction to Software Engineering.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
1 Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California
9 1 Chapter 9 Database Design Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 1 Slide 1 An Introduction to Software Engineering.
SE 112 Slide 1 SE 112 l
Configuration Management
December 3, 2010 SAIF Governance Framework A Brief Update on work to date.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 1 Slide 1 System and Software Engineering.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
This chapter is extracted from Sommerville’s slides. Text book chapter
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
Functional Model Workstream 1: Functional Element Development.
Unit 2: Engineering Design Process
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Requirements Engineering Processes l Processes used to discover, analyse and.
1 On the Record Report of the Library of Congress Working Group on the Future of Bibliographic Control Diane Boehr Head of Cataloging, NLM
Introduction to Software Engineering
Chapter 2: Software Process Omar Meqdadi SE 2730 Lecture 2 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Agents on the Semantic Web – a roadmap to the future An arial view from feet.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Configuration Management (CM)
Secure Systems Research Group - FAU Classifying security patterns E.B.Fernandez, H. Washizaki, N. Yoshioka, A. Kubo.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
1 Yolanda Gil Information Sciences InstituteFebruary 4, 2010 Metadata Meets Semantic Workflows Yolanda Gil, PhD Information Sciences Institute.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 1 Slide 1 Chapter 1 Introduction.
An Introduction to Software Engineering. Communication Systems.
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Agents on the Semantic Web – a roadmap to the future An arial view from feet.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
An Introduction to Software Engineering (Chapter 1 from the textbook)
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Software Processes.
1 CSC 4700 Software Engineering John Lewis These slides are based on originals provided by Ian Sommerville.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
1 / 23 Presenter: Dong Dai, DISCL Lab. TTU Data-Intensive Scalable Computing Laboratory Department of Computer Science Accelerating Scientific.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
1 Process activities. 2 Software specification Software design and implementation Software validation Software evolution.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Software Project Configuration Management
Ontology Evolution: A Methodological Overview
Bird of Feather Session
Data Provenance.
Presentation transcript:

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California

2 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

3 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Benefits of Semantic Workflows [Gil JSP-09] Execution management: Automation of workflow execution Managing distributed computation Managing large data sets Security and access control Provenance recording Low-cost high fidelity reproducibility Semantics and reasoning: Workflow retrieval and discovery Automation of workflow generation Systematic exploration of design space Validation of workflows Automated generation of metadata Guarantees of data pedigree “Conceptual” reproducibility

4 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Semantic Workflows in Wings [Kim et al CCPEJ 08; Gil et al IEEE eScience 09; Gil et al K-CAP 09; Kim et al IUI 06; Gil et al IEEE IS 2010] Workflows augmented with semantic constraints Each workflow constituent has a variable associated with it –Nodes, links, workflow components, datasets –Workflow variables can represent collections of data as well as classes of software components Constraints are used to restrict variables, and include: –Metadata properties of datasets –Constraints across workflow variables Incorporate function of workflow components: how data is transformed Reasoning about semantic constraints in a workflow Algorithms for semantic enrichment of workflow templates Algorithms for matching queries against workflow catalogs Algorithms for generating workflows from high-level user requests Algorithms for generating metadata of new data products Algorithms for assisting users w/creation of valid workflow templates

5 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Semantic Workflows in WINGS Workflow templates Dataflow diagram Each constituent (node, link, component, dataset) has a corresponding variable Semantic properties Constraints on workflow variables (TestData dcdom:isDiscrete false) (TrainingData dcdom:isDiscrete false)

6 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Semantic Constraints as Metadata Properties Constraints on reusable template (shown below) Constraints on current user request (shown above) [modelerInput_not_equal_to_classifierInput: (:modelerInput wflow:hasDataBinding ?ds1) (:classifierInput wflow:hasDataBinding ?ds2) equal(?ds1, ?ds2) (?t rdf:type wflow:WorkflowTemplate) > (?t wflow:isInvalid "true"^^xsd:boolean)]

7 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

8 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Workflow Representations State-of-the-art: Some standards for semantic workflow representations have been proposed by the community OWL-S, WSMO, SWSL, SWSF, … None has widespread use None has been adopted by a standards body So far not clear application-driven requirements Recommendation: caBIG should support the development of semantic workflow representation standards Provide requirements based on expected functionality to be delivered in caBIG Participate in and lead on standardization efforts Recommendation: caBIG should develop (iteratively) applications that demonstrate the use of semantic workflow representations

9 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

10 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Provenance State-of-the-art: Some provenance representations have been proposed for provenance recording Semantic: Provenance vocabulary (RDF-based) Non-semantic: Open Provenance Model (XML-based) None has widespread use None has been adopted by a standards body Growing interest in provenance in the Semantic Web community: many requirements coming from eScience, eGovernment, Linked data, etc. New W3C Incubator Group on Provenance is developing use cases, requirements, state-of-the-art, and recommendations for follow-on standards efforts Recommendation: caBIG should support the development of provenance representation standards Provide requirements based on caBIG requirements Participate in and lead on standardization efforts Recommendation: caBIG should develop (iteratively) applications that demonstrate the use of provenance representations

11 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group (Y. Gil, chair): 1) Why Provenance Provenance: Sources of information + entities + processes involved in producing an artifact Some uses of provenance: Making trust judgments when information sources are diverse and of varying quality (the Web!) Providing justifications for conclusions Establishing attribution Enabling comparison and reproducibility of processes Provenance is ubiquitous: Business practice, cultural artifacts, science applications, etc Data integration, licensing and attribution, accountability Provenance is a pressing issue for the Semantic Web Linked Data eScience (HCLS) and eGovernment (data.gov.uk)

12 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group: 2) Charter and Goals of the Incubator Group Provide state-of-the-art understanding and develop a roadmap for development and possible standardization Articulate requirements for accessing and reasoning about provenance information Develop use cases Identify issues in provenance that are direct concern to the Semantic Web Articulate relationships with other aspects of Web architecture Report on state-of-the-art work on provenance Report on a roadmap for provenance in the Semantic Web Identify starting points for provenance representations Identifying elements of a provenance architecture that would benefit from standardization

13 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group: 3) Products of the Group to Date Group formed in September 2009 All information is public: Developed a set of key dimensions for provenance (11/09) Grouped into three major categories: content, management, use Developed use cases for provenance (12/09) More than 30 use cases, most were improved and curated Include ~10 eScience use cases, some motivated by caBIG Developed requirements for provenance that arise from the use cases (1/10) User requirements: what is the purpose/use of the provenance information Technical requirements: derived from the user requirements Currently developing state-of-the-art report (expected 6/10)

14 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group: 4) Use Cases Developed 1. Result Differences 2. Anonymous Information 3. Information Quality Assessment for Linked Data 4. Timeliness 5. Simple Trustworthiness Assessment 6. Ignoring Unreliable Data 7. Answering user queries that require semantically annotated provenance 8. Provenance in Biomedicine 9. Closure of Experimental Metadata 10. Locating Biospecimens With Sufficient Quality 11. Using process provenance for assessing the quality of Information products 12. Provenance Tracking in the Blogosphere 13. Provenance of a Tweet 14. Provenance and Private Data Use 15. Provenance of Decision Making in Emergency Response 16. Provenance of Collections vs Objects in Cultural Heritage 17. Provenance at different levels in Cultural Heritage 18. Identifying attribution and associations 19. Determining Compliance with a License 20. Documenting axiom formulation 21. Evidence for public policy 22. Evidence for engineering design 23. Fulfilling Contractual Obligations 24. Attribution for a versioned document 25. Provenance for Environmental Marine Data 26. Crosswalk Maintenance 27. Metadata Merging 28. Mapping Digital Rights 29. Computer Assisted Research 30. Handling Scientific Measurement Anomaly 31. Human-Executed Processes 32. Semantic disambiguation of data provider identity 33. Hidden Bug

15 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group (Y. Gil, chair): 5) Major Dimensions of Provenance Content Attribution - provenance as the sources or entities that were used to create a new result Responsibility - knowing who endorses a particular piece of information or result Origin - recorded vs reconstructed, verified vs non-verified, asserted vs inferred Process - provenance as the process that yielded an artifact Reproducibility (eg workflows, mashups, text extraction) Data Access (e.g. access time, accessed server, party responsible for accessed server) Evolution and versioning Republishing (e.g. retweeting, reblogging, republishing) Updates (eg a document with content from various sources and that changes over time) Justification for decisions – Includes argumentation, hypotheses, why-not questions Entailment - given the results to a particular query, what axioms or tuples led to those results Management Publication - Making provenance information available (expose, distribute) Access - Finding and querying provenance information Dissemination control – Track policies specified by creator for when/how an artifact can be used Access Control - incorporate access control policies to access provenance information Licensing - stating what rights the object creators and users have based on provenance Law enforcement (eg enforcing privacy policies on the use of personal information) Scale - how to operate with large amounts of provenance information

16 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 W3C Provenance Group (Y. Gil, chair): 5) Major Dimensions of Provenance (Cont’d) Use Understanding - End user consumption of provenance. abstraction, multiple levels of description, summary presentation, visualization Interoperability - combining provenance produced by multiple different systems Comparison - finding what's in common in the provenance of two or more entities (eg two experimental results) Accountability - the ability to check the provenance of an object with respect to some expectation Verification - of a set of requirements Compliance - with a set of policies Trust - making trust judgments based on provenance Information quality - choosing among competing evidence from diverse sources (eg linked data use cases) Incorporating reputation and reliability ratings with attribution information Imperfections - reasoning about provenance information that is not complete or correct Incomplete provenance Uncertain/probabilistic provenance Erroneous provenance Fraudulent provenance Debugging

17 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

18 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Workflow Versioning Workflow is a first-class citizen of scientific research Has lifecycle: design, creation, evolution, Can be classified into meaningful categories and has relationships to other similar workflows (eg “more-modern-version-than”) Little work done to date on workflow evolution and versioning VizTrails, myExperiment look at workflows that evolve over time Crucial area to pursue if workflows are to be used daily in an organization Recommendation: Workflow versioning is a largely unchartered research area that caBIG could pioneer

19 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Recommendations Workflow representations Provenance Workflow versioning Adaptive workflows

20 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Adaptive Workflow The environment where workflows live are highly dynamic New datasets, new analytic tools, new execution resources New services in the architecture Workflow systems tend to have a fixed behavior Some work on dynamic workflow steering based on execution Semantic workflow systems can have adaptive behaviors Find/discover recent datasets relevant to a user’s analysis Incorporate new analytic tools into a workflow Assign execution resources on the fly Recommendation: caBIG should explore the use of semantic technologies that can make workflow approaches more adaptive Recommendation: caBIg should develop semantic workflow system architectures that allow modular incorporation of new services in the architecture New data catalogs, new component catalogs, etc

21 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Background on Linked Data “The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.” Adopted by many: data.gov.uk, NYT, etc Growing virally and very fast As of May’09, it included over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links

22 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 caBIG and Semantic Web, Linked Data Much content in Linked Data is on topics directly relevant to caBIG Colored in pink circles in the diagram of previous slide Linked Data offers light methodology (5 design principles) and flexible architecture (open semantic web technologies) Other flexible technologies in semantic web area (eg SKOS) Recommendation: caBIG should investigate the use of Linked Data and other open semantic web technologies to support biomedical research within caBIG Will provide greater flexibility needed in basic research projects Recommendation: caBIG should pioneer the use of semantic workflow technologies with Linked Data Workflow catalogs implemented with Linked Data, ie, semantic workflows as Linked Data objects