1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA.

Slides:



Advertisements
Similar presentations
Geoinformatics 2008 Fox Semantic Provenance 1 Semantic Provenance for Image Data Processing Peter Fox (HAO/ESSL/NCAR) Deborah McGuinness (RPI) Jose Garcia,
Advertisements

1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
1 Informatics: Filling the gap between science and ICT in a sustainable way Peter Fox Tetherless World Constellation Rensselaer Polytechnic Institute Formerly:
Bringing Data Science, Xinformatics and Semantic eScience into the Graduate Curriculum (solicited) EGU (EOS 6/ ESSI2.3) April 25, 2012, Vienna.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
1 Class Exercise I: Use Cases Deborah McGuinness and Peter Fox (NCAR) CSCI Week 4 (part II), 2008.
Configurable User Interface Framework for Cross-Disciplinary and Citizen Science Presented by: Peter Fox Authors: Eric Rozell, Han Wang, Patrick West,
Semantic Web Cyberinfrastructure for Virtual Observatories Deborah L. McGuinness Acting Director and Senior Research Scientist Knowledge Systems, AI Laboratory.
Fox OOS meeting 1 Ontologies and Semantic Applications in Earth Sciences Peter Fox (TWC/RPI; formerly HAO/NCAR) Thanks to many. Projects funded.
Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Biological and Chemical Oceanography Data Management Office slide 1 of 21 Interoperability ~ An Introduction Cyndy Chandler Biological and Chemical Oceanography.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
McGuinness Geon 5/5/2005 SOLAR-TERRESTRIAL ONTOLOGIES (for VSTO and Beyond) Peter Fox 1, Deborah McGuinness 3, Don Middleton 2, Stan Solomon 1, Jose Garcia.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Semantically-Enabled Science Data Integration (SESDI) and The Virtual Solar-Terrestrial Observatory (VSTO) Semantically-enabled (large-scale) Scientific.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
What is the VSO? (and what isn’t it?). The VSO …  Allows you to search multiple archives in a single search  Keeps you from needing to keep track of.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Semantically-Enabled Virtual Observatories: VSTO Highlights for Observational Data Deborah McGuinness Acting Director and Senior Research Scientist Knowledge.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Solar Terrestrial Ontologies – in Support of Virtual Observatories and Large Scale Semantic Scientific Data Integration Deborah McGuinness Co-Director.
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
STEREO and the Virtual Heliospheric Observatory Tom Narock 1,2 Adam Szabo 1 Jan Merka 2 (1) NASA/Goddard Space Flight Center (2) L3 Communications, GSI.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
ISWG / SIF / GEOSS OOSSIW - November, 2008 GEOSS “Interoperability” Steven F. Browdy (ISWG, SIF, SCC)
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
The Role of Virtual Observatories and Data Frameworks in an Era of Big Data NIST bIG dATA June 14, 2012, Gaithersburg, MD Peter Fox (RPI and WHOI)
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Get the poster at Semantic Visualization Provenance Records:
Bit.ly/2c3XMgd.
improve the efficiency, collaborative potential, and
Informatics underlying Data Science (ists)
Emily CoBabe-Ammann (CU-LASP) Peter Fox (NCAR)
eGY Planning Meeting Boulder, February 2005
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Presentation transcript:

1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA projects

2 Outline Background, definitions Informatics -> e-Science Data has lots of uses –Virtual Observatories: use cases –Data Framework: Examples –Data ingest, integration, mining and … Discussion Fox HDF: Semantic Data Burden Shift Oct 15, 2008

3 Background Scientists should be able to access a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology… Fox HDF: Semantic Data Burden Shift Oct 15, 2008

4 But data has Lots of Audiences From “Why EPO?”, a NASA internal report on science education, 2005 More Strategic Less Strategic Information Information products have SCIENTISTS TOO Fox HDF: Semantic Data Burden Shift Oct 15, 2008

5 The Information Era: Interoperability managing and accessing large data sets higher space/time resolution capabilities rapid response requirements data assimilation into models crossing disciplinary boundaries. Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system: Fox HDF: Semantic Data Burden Shift Oct 15, 2008

6 Shifting the Burden from the User to the Provider Fox HDF: Semantic Data Burden Shift Oct 15, 2008

7 Modern capabilities Fox HDF: Semantic Data Burden Shift Oct 15, 2008

8 Mind the Gap! As a result of finding out who is doing what, sharing experience/ expertise, and substantial coordination: There is/ was still a gap between science and the underlying infrastructure and technology that is available Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.  Informatics - information science includes the science of (data and) information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, behavior, and interactions of natural and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual and theoretical foundations. Since computers, individuals and organizations all process information, informatics has computational, cognitive and social aspects, including study of the social impact of information technologies. Wikipedia. Fox HDF: Semantic Data Burden Shift Oct 15, 2008

9 Progression after progression ITCyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, SBAs Informatics Fox HDF: Semantic Data Burden Shift Oct 15, 2008

10 Virtual Observatories Conceptual examples: In-situ: Virtual measurements –Related measurements Remote sensing: Virtual, integrative measurements –Data integration Managing virtual data products/ sets

11 Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage and “smart” tools for evolution and maintenance.

12 Early days of discipline specific VOs … … VO 1 VO 2 VO 3 DB 2 DB 3 DB n DB 1 ?

13 The Astronomy approach; data- types as a service … … VO App 1 VO App 2 VO App 3 DB 2 DB 3 DB n DB 1  VOTable  Simple Image Access Protocol  Simple Spectrum Access Protocol  Simple Time Access Protocol VO layer Limited interoperability Lightweight semantics Limited meaning, hard coded Limited extensibility Under review Open Geospatial Consortium: Web {Feature, Coverage, Mapping} Service Sensor Web Enablement: Sensor {Observation, Planning, Analysis} Service use the same approach

14 … … VO Portal Web Serv. VO API DB 2 DB 3 DB n DB 1 Semantic mediation layer - VSTO - low level Semantic mediation layer - mid-upper-level Education, clearinghouses, other services, disciplines, et c. Metadata, schema, data Query, access and use of data Semantic query, hypothesis and inference Semantic interoperability Added value Mediation Layer Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Maps queries to underlying data Generates access requests for metadata, data Allows queries, reasoning, analysis, new hypothesis generation, testing, explanation, et c.

15 Content: Coupling Energetics and Dynamics of Atmospheric Regions WEB Community data archive for observations and models of Earth's upper atmosphere and geophysical indices and parameters needed to interpret them. Includes browsing capabilities by periods, > 310 instruments, models, > 820 parameters…

16 Content: Mauna Loa Solar Observatory Near real-time data products from Hawaii from a variety of solar instruments. Source for space weather, solar variability, and basic solar physics Other content used too - Center for Integrated Space Weather Modeling

17 Semantic Web Methodology and Technology Development Process Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Use Case Small Team, mixed skills Analysis Adopt Technology Approach Leverage Technology Infrastructure Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Science/Expert Review & Iteration Develop model/ ontology

18 Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity. –Extract information from the use-case - encode knowledge –Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere- Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

Fox RPI: Semantic Data Frameworks May 14, VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, Web Service

20 Partial exposure of Instrument class hierarchy - users seem to LIKE THIS Semantic filtering by domain or instrument hierarchy

21

Fox RPI: Semantic Data Frameworks May 14, Inferred plot type and return formats for data products

Fox RPI: Semantic Data Frameworks May 14, Inferred plot type and return required axes data

24 Semantic Web Benefits Unified/ abstracted query workflow: Parameters, Instruments, Date-Time Decreased input requirements for query: in one case reducing the number of selections from eight to three Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services –understanding of coordinate systems, relationships, data synthesis, transformations, et c. –returns independent variables and related parameters A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

25 What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge

26 Teacher receives four groupings of search results: 1) Educational materials: and ) Research, data and tools: via VSTO, VSPO and VITMO, knows to search for brightness, or green/red line emission 3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean?: Aurora Borealis or Aurora Australis, et c. What should the User Receive?

Fox RPI: Semantic Data Frameworks May 14, Semantic Information Integration: Concept map for educational use of science data in a lesson plan

Fox RPI: Semantic Data Frameworks May 14,

29 Scaling to large numbers of data providers and redefining the role(s)/ relations with them Crossing discipline boundaries Security, access to resources, policies Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?) Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …) Data quality, preservation, stewardship Issues for Virtual Observatories These are currently burden areas for users

30 Problem definition Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision We often fail to capture, represent and propagate manually generated information that need to go with the data flows Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects The task of event determination and feature classification is onerous and we don't do it until after we get the data

31 Determine which flat field calibration was applied to the image taken on January, 26, 2005 around 2100UT by the ACOS Mark IV polarimeter. Which flat-field algorithm was applied to the set of images taken during the period November 1, 2004 to February 28, 2005? How many different data product types can be generated from the ACOS CHIP instrument? What images comprised the flat field calibration image used on January 26, 2007 for all ACOS CHIP images? What processing steps were completed to obtain the ACOS PICS limb image of the day for January 26, 2005? Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, :09UT taken by the ACOS Mark IV polarimeter? What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? Find all good images on March 21, Why are the quick look images from March 21, 2008, 1900UT missing? Why does this image look bad? Use cases

32 Provenance Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility

33

34

35

36 Visual browse

37

38

39 Discussion (1) Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability –Use cases (i.e. real users) –X-informatics –Core Informatics –Cyber Informatics There are implications for data models

40 Progression after progression ITCyber Infrastru cture Cyber Informatics Core Informatics Science Informatics Science, SBAs Informatics Example: CI = OPeNDAP server running over HTTP/HTTPS Cyberinformatics = Data (product) and service ontologies, triple store Core informatics = Reasoning engine (Pellet), OWL Science (X) informatics = Use cases, science domain terms, concepts in an ontology

41 Discussion (2) Data and information science is becoming the ‘fourth’ column (along with theory, experiment and computation) Semantics (of the data) are a very key ingredient -> may imply richer data models

Fox RPI: Semantic Data Frameworks May 14, Summary Informatics is playing a key role in filling the gap between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure, i.e. in shifting the burden –This is evident due to the emergence of Xinformatics (world-wide) Our experience is implementing informatics as semantics in Virtual Observatories (as a working paradigm) and Grid environments –VSTO is only one example of success –Data mining, data integration, smart search, provenance are close behind Informatics is a profession and a community activity and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic

43 More Information Virtual Solar Terrestrial Observatory (VSTO): Semantically-Enalbed Science Data Integration (SESDI): Semantic Provenance Capture in Data Ingest Systems (SPCDIS): Semantic Knowledge Integration Framework (SKIF/SAM): Semantic Web for Earth and Environmental Terminology (SWEET): Conferences: AGU 2008, EGU 2009, ISWC 2008, CIKM 2008, … Peter Fox