Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linked Data and the Provenance Explosion Deborah L. McGuinness Tetherless World Constellation Chair Professor of Computer Science and Cognitive Science.

Similar presentations


Presentation on theme: "Linked Data and the Provenance Explosion Deborah L. McGuinness Tetherless World Constellation Chair Professor of Computer Science and Cognitive Science."— Presentation transcript:

1 Linked Data and the Provenance Explosion Deborah L. McGuinness Tetherless World Constellation Chair Professor of Computer Science and Cognitive Science Director RPI Web Science Trust Network Lab Rensselaer Polytechnic Institute DPDM, March 18, 2011 Melbourne, Australia

2 Outline – Motivating scenarios Light Weight data connectivity in health setting –PopSciGrid –Provenance-related questions Mid-weight semantic modeling in interdisciplinary virtual observatories –Virtual Solar Terrestrial Observatory -> Semantic Provenance Capture in Data Ingest Systems Mobile platform, social networking enhanced advisor –Semantic Sommelier –Provenance Infrastructure and Directions –Discussion and Directions

3 Selected Background Bell Labs: designing Description Logics (DLs) & environments aimed at supporting applications such as configuration. –led to research on making DL-based systems useful – with focus on explanation Stanford University: focus on ontology-enabled xx, large hybrid systems, later X-informatics often for eScience –led to ontology evolution & diagnostic environments, expanded explanation settings including “messy” hybrid systems with new provenance emphasis

4 Background cont. Rensselaer Polytechnic Institute/ Tetherless World Constellation: next generation web, web science research center, open data, next generation semantic eScience –Led to more connections with social platforms, empowering collections (of users, data, etc.)

5 TWC

6 Population Sciences Grid Goals (with NIH/NCI, Northwester) Convey complex health-related information to consumer and public health decision makers for community health impact Leverage the growing evidence base for communicating health information on the Internet Inform the development of future research opportunities effectively utilizing cyberinfrastructure for cancer prevention and control. 6

7 Computer Science Slant How can semantic technologies be used to integrate, present, and analyze data for a wide range of users? Can tools allow lay people to build their own demos and support public usage and accurate interpretation? How do we facilitate collaboration and making applications “viral”? Within PopSciGrid: –Which policies (taxation, smoking bans, etc) impact health and health care costs? –What data should we display to help scientists and lay people evaluate related questions? –What data might be presented so that people choose to make (positive) behavior changes? –What does the data show? why should someone believe that? –What are appropriate follow ups? 7

8 PopSciGrid

9

10 PopSciGrid II http://logd.tw.rpi.edu/demo/tax-cost-policy-prevalence

11 PopSciGrid III

12 Questions Overall data: –What data is used? –How recent is it? –What are the conditions under which it was obtained? –Is it reliable for this purpose? Pick one item like prevalence – is this the best parameter to focus on? –What is prevalence (definition)? –How is it measured (overall / in this data set)? –Are there another or better proxy (e.g., packs sold) Do we need more data, more inference, more xxx…

13 November 9, 2006 13 Virtual Observatory (VSTO) General: Find data subject to certain constraints and plot appropriately Specific: Plot the observed/measured Neutral Temperature as recorded by the Millstone Hill Fabry-Perot interferometer while looking in the vertical direction at any time of high geomagnetic activity in a way that makes sense for the data.

14 November 9, 2006Deborah L. McGuinness14 Partial exposure of Instrument class hierarchy - users seem to LIKE THIS

15 VSTO Results Many Benefits: –Reduced query formation from 8 to 3 steps and reduced choices at each stage –Allowed scientists to get data from instruments they never knew of before (e.g., photometers in example) –Supported augmentation and validation of data –Useful and related data provided without having to be an expert to ask for it –Integration and use (e.g. plotting) based on inference –Ask and answer questions not possible before BUT Needed Provenance –Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual Solar- Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. In the Proceedings of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada, July 22-26, 2007. –Peter Fox, Deborah L. McGuinness, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. Ontology-supported Scientific Data Frameworks: The Virtual Solar-Terrestrial Observatory Experience. In Computers and Geosciences - Elsevier. Volume 35, Issue 4 (2009).

16 Explanation via Graph Explanation via Customized Summary Explanation via Annotation Inference Web (IW) End Users End-User Interact ion services Distributed PML data Data Access & Data Analysis Services Validate PML data Access published PML data Inference Web is a semantic web-based knowledge provenance management infrastructure: Uses a provenance interlingua (PML) for encoding and interchange of provenance metadata in distributed environments Provides interactive explanation services for end-users Provides data access and analysis services for enriching the value of knowledge provenance It has been used in a wide range of applications

17 17 Making Systems Actionable using Knowledge Provenance Mobile Wine Agent GILA Combining Proofs in TPTP CALO 17 Knowledge Provenance in Virtual Observatories 17 Intelligence Analyst Tools NOW including Data-gov

18 Proof/Provenance Markup Language (PML) A kind of linked data on the Web Modularized & extensible –Provenance: annotate provenance properties –Justification: encodes provenance relations (including support for multiple justifications) –Trust: add trust annotation Semantic Web based Enterprise Web World Wide Web DD PML data PML data D D D PML data PML data … PML data D D PML data PML data D

19 User Require Provenance! Users demand it! If users (humans and agents) are to use, reuse, and integrate system answers, they must trust them. Intelligence analysts: (from DTO/IARPA’s NIMD) Andrew. Cowell, Deborah McGuinness, Carrie Varley, and David A. Thurman. Knowledge-Worker Requirements for Next Generation Query Answering and Explanation Systems. Proc. of Intelligent User Interfaces for Intelligence Analysis Workshop, Intl Conf. on Intelligent User Interfaces (IUI 2006), Sydney, Australia. Intelligent Assistant Users: (from DARPA’s PAL/CALO) Alyssa Glass, Deborah L. McGuinness, Paulo Pinheiro da Silva, and Michael Wolverton. Trustable Task Processing Systems. In Roth-Berghofer, T., and Richter, M.M., editors, KI Journal, Special Issue on Explanation, Kunstliche Intelligenz, 2008. Virtual Observatory Users: (from NSF’s VSTO) Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada. And… as systems become more diverse, distributed, embedded, and depend on more varied data and communities, more provenance and more types are needed.

20 20 CHIP Pipeline ( Chromospheric Helium Image Photometer ) Mauna Loa Solar Observatory (MLSO) Hawaii National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Intensity Images (GIF) Velocity Images (GIF) Follow-up Processing on Raw Data (e.g., Flat Field Calibration) Quality Checking (Images Graded: GOOD, BAD, UGLY) Raw Image Data Captured by CHIP Chromospheric Helium-I Image Photometer Raw Data Capture Publishes 20

21 21 Semantic Provenance Capture for Data Ingest Systemcs (SPCDIS) Fact: Scientific data services are increasing in usage and scope, and with these increases comes growing need for access to provenance information. Provenance Project Goal: to design a reusable, interoperable provenance infrastructure. Science Project Goal: design and implement an extensible provenance solution that is deployed at the science data ingest/ product generation time. Outcome: implemented provenance solution in one science setting AND operational specification for other scientific data applications. Extends vsto.org

22 ACOS Data Ingest Typical science data processing pipelines Distributed Some metadata in silos Much metadata lost Many human-in-loop decisions, events No metadata infrastructure for any user Community is broadening Chromospheric Helium Imaging Photometer (CHIP) Data Ingest ACOS – Advanced Coronal Observing System 22

23 The Advanced Coronal Observing System case for Provenance ?? ? SourceProcessingProduct Provenance metadata currently not propagated with or linked to the data products Processing metadata Origin (observation) metadata Data products are the result of “black box” systems Most users do not know what calibrations, transformations, and QA processing have been applied to the data product 23

24 Advanced Coronal Observing System (ACOS) Provenance Use Cases What were the cloud cover and seeing conditions during the observation period of this image? What calibrations have been applied to this image? Why does this image look bad? 24

25 PML Usage in SPCDIS Justification –Explanation –Causality graph Provenance –Conclusion –Source –Engine –Rule Trust –Trust/Belief metrics NodeSet Justification Conclusion NodeSet Justification Conclusion NodeSet Justification Conclusion Engine Rule hasAntecedentList hasSourceUsage hasInferenceRule hasInferenceEngine SourceUsage Source DateTime 25

26 PML in Action in SPCDIS This is the PML provenance encoding for a “quick look” gif file, which is generated from two image data datasets Node set for the quickloook gif file hasConclusion: a reference to the gif file itself InferenceStep : how the gif file was derived hasAntecedents hasInferenceRule hasInferenceEngine The “antecedents” of the quicklook gif file are other node sets

27 A PML-Enhanced Image provenance CHIP Quick-Look CHIP PML-Enhance Quick-Look

28 Integrated View Observer log’s information added into quicklook image’s provenance

29 Provenance aware faceted search Tetherless World Constellation29

30 Current Issues Successful interdisciplinary VO; needed provenance Successful provenance integration for experts; needs to support more diverse audience –As the user base diversifies, what updates are needed? –Will a domain ontology for MLSO/NCAR-affiliated staff be understandable by citizen scientists?... No –How can our representational infrastructure be extended with contextual information relevant to user needs? E.g., linking data products from one part of the CHIP pipeline to specific solar events or events at MLSO (such as reports of bad weather) –Should provenance ontologies provide extensional capabilities to include domain-informed extensions – yes –[1] Stephan Zednik, Peter Fox and Deborah L. McGuinness, “System Transparency, or How I Learned to Worry about Meaning and Love Provenance!” Proceedings of IPAW 2010 –[2] James R. Michaelis, Li Ding, Zhenning Shangguan, Stephan Zednik, Rui Huang, Paulo Pinheiro da Silva, Nicholas Del Rio and Deborah L. McGuinness, “Towards Usable and Interoperable Workflow Provenance: Empirical Case Studies Using PML” Proceedings of SWPM 2009 –[3] AGU 2010 with papers with Fox, et al, McGuinness et al., Zednick et al,, West. et. al, Michaelis et al, … 30

31 Wine Agent – Semantic Sommelier

32 Wine Agent for iPhone Client application which talks to a SW service Make requests for dishes and wines using auto- generated interfaces Make recommendations to the system for others

33 Getting the Recommendation Recommendations are made up of two classes: 1 dish, 1 wine When the instance is realized, the agent looks up matching recommendations and returns the results Tapping a particular recommendation causes the wine agent to look for pairings which match the recommendation

34 Our Position System Transparency supports user understanding and trust Our Research Goal: Provide interoperable infrastructure that supports explanations of sources, assumptions, and answers as an enabler for trust

35 Provenance Events CSV2RDF SemDiff Archive Enhance visualize derive create derive revision

36 Challenges for Data Aggregators (with Tim Lebo, Greg Williams) 36

37 Discussion Provenance is growing in acceptance, need, and type Provenance data could easily dwarf other data in volume Some interlinguas have emerged that have significant usage and have shown significant value and are ready to be used (plus standard likely from W3C) Interdisciplinary eScience and open data are increasing the need and pace 37

38 Discussion II A few trends we have observed: –Techniques for supporting interaction with large diverse communities are needed (we believe user annotation is one such critical technique) –Data aggregators face additional challenges if provenance is not available… and may accelerate the demand for provenance and provenance standards –Getting back to the portion of the source used is critical for some –Tracking manipulations is critical for some –Providing and creating provenance as part of a larger eco- system is key –Domain-specific extensions can be of value Open (govt, science, etc) data (along with semantic web applications with embedded information about knowledge provenance and term meaning) is providing many new opportunities and will continue to change our lives. Questions? dlm cs rpi edu 38


Download ppt "Linked Data and the Provenance Explosion Deborah L. McGuinness Tetherless World Constellation Chair Professor of Computer Science and Cognitive Science."

Similar presentations


Ads by Google