Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA.

Similar presentations


Presentation on theme: "1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA."— Presentation transcript:

1 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA projects

2 2 Outline Background, definitions Informatics -> e-Science Data has lots of uses –Virtual Observatories: use cases –Data Framework: Examples –Data ingest, integration, mining and … Discussion Fox HDF: Semantic Data Burden Shift Oct 15, 2008

3 3 Background Scientists should be able to access a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology… Fox HDF: Semantic Data Burden Shift Oct 15, 2008

4 4 But data has Lots of Audiences From “Why EPO?”, a NASA internal report on science education, 2005 More Strategic Less Strategic Information Information products have SCIENTISTS TOO Fox HDF: Semantic Data Burden Shift Oct 15, 2008

5 5 The Information Era: Interoperability managing and accessing large data sets higher space/time resolution capabilities rapid response requirements data assimilation into models crossing disciplinary boundaries. Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system: Fox HDF: Semantic Data Burden Shift Oct 15, 2008

6 6 Shifting the Burden from the User to the Provider Fox HDF: Semantic Data Burden Shift Oct 15, 2008

7 7 Modern capabilities Fox HDF: Semantic Data Burden Shift Oct 15, 2008

8 8 Mind the Gap! As a result of finding out who is doing what, sharing experience/ expertise, and substantial coordination: There is/ was still a gap between science and the underlying infrastructure and technology that is available Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.  Informatics - information science includes the science of (data and) information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, behavior, and interactions of natural and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual and theoretical foundations. Since computers, individuals and organizations all process information, informatics has computational, cognitive and social aspects, including study of the social impact of information technologies. Wikipedia. Fox HDF: Semantic Data Burden Shift Oct 15, 2008

9 9 Progression after progression ITCyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, SBAs Informatics Fox HDF: Semantic Data Burden Shift Oct 15, 2008

10 10 Virtual Observatories Conceptual examples: In-situ: Virtual measurements –Related measurements Remote sensing: Virtual, integrative measurements –Data integration Managing virtual data products/ sets

11 11 Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage and “smart” tools for evolution and maintenance.

12 12 Early days of discipline specific VOs … … VO 1 VO 2 VO 3 DB 2 DB 3 DB n DB 1 ?

13 13 The Astronomy approach; data- types as a service … … VO App 1 VO App 2 VO App 3 DB 2 DB 3 DB n DB 1  VOTable  Simple Image Access Protocol  Simple Spectrum Access Protocol  Simple Time Access Protocol VO layer Limited interoperability Lightweight semantics Limited meaning, hard coded Limited extensibility Under review Open Geospatial Consortium: Web {Feature, Coverage, Mapping} Service Sensor Web Enablement: Sensor {Observation, Planning, Analysis} Service use the same approach

14 14 … … VO Portal Web Serv. VO API DB 2 DB 3 DB n DB 1 Semantic mediation layer - VSTO - low level Semantic mediation layer - mid-upper-level Education, clearinghouses, other services, disciplines, et c. Metadata, schema, data Query, access and use of data Semantic query, hypothesis and inference Semantic interoperability Added value Mediation Layer Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Maps queries to underlying data Generates access requests for metadata, data Allows queries, reasoning, analysis, new hypothesis generation, testing, explanation, et c.

15 15 Content: Coupling Energetics and Dynamics of Atmospheric Regions WEB Community data archive for observations and models of Earth's upper atmosphere and geophysical indices and parameters needed to interpret them. Includes browsing capabilities by periods, > 310 instruments, models, > 820 parameters…

16 16 Content: Mauna Loa Solar Observatory Near real-time data products from Hawaii from a variety of solar instruments. Source for space weather, solar variability, and basic solar physics Other content used too - Center for Integrated Space Weather Modeling

17 17 Semantic Web Methodology and Technology Development Process Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Use Case Small Team, mixed skills Analysis Adopt Technology Approach Leverage Technology Infrastructure Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Science/Expert Review & Iteration Develop model/ ontology

18 18 Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity. –Extract information from the use-case - encode knowledge –Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere- Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

19 Fox RPI: Semantic Data Frameworks May 14, 2008 19 VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.orgvsto.hao.ucar.eduwww.vsto.org Web Service

20 20 Partial exposure of Instrument class hierarchy - users seem to LIKE THIS Semantic filtering by domain or instrument hierarchy

21 21

22 Fox RPI: Semantic Data Frameworks May 14, 2008 22 Inferred plot type and return formats for data products

23 Fox RPI: Semantic Data Frameworks May 14, 2008 23 Inferred plot type and return required axes data

24 24 Semantic Web Benefits Unified/ abstracted query workflow: Parameters, Instruments, Date-Time Decreased input requirements for query: in one case reducing the number of selections from eight to three Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services –understanding of coordinate systems, relationships, data synthesis, transformations, et c. –returns independent variables and related parameters A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

25 25 What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge

26 26 Teacher receives four groupings of search results: 1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.php and http://www.meted.ucar.edu/hao/aurora/ http://www.meted.ucar.edu/topics_spacewx.php http://www.meted.ucar.edu/hao/aurora/ 2) Research, data and tools: via VSTO, VSPO and VITMO, knows to search for brightness, or green/red line emission 3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean?: Aurora Borealis or Aurora Australis, et c. What should the User Receive?

27 Fox RPI: Semantic Data Frameworks May 14, 2008 27 Semantic Information Integration: Concept map for educational use of science data in a lesson plan

28 Fox RPI: Semantic Data Frameworks May 14, 2008 28

29 29 Scaling to large numbers of data providers and redefining the role(s)/ relations with them Crossing discipline boundaries Security, access to resources, policies Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?) Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …) Data quality, preservation, stewardship Issues for Virtual Observatories These are currently burden areas for users

30 30 Problem definition Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision We often fail to capture, represent and propagate manually generated information that need to go with the data flows Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects The task of event determination and feature classification is onerous and we don't do it until after we get the data

31 31 Determine which flat field calibration was applied to the image taken on January, 26, 2005 around 2100UT by the ACOS Mark IV polarimeter. Which flat-field algorithm was applied to the set of images taken during the period November 1, 2004 to February 28, 2005? How many different data product types can be generated from the ACOS CHIP instrument? What images comprised the flat field calibration image used on January 26, 2007 for all ACOS CHIP images? What processing steps were completed to obtain the ACOS PICS limb image of the day for January 26, 2005? Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter? What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? Find all good images on March 21, 2008. Why are the quick look images from March 21, 2008, 1900UT missing? Why does this image look bad? Use cases

32 32 Provenance Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility

33 33

34 34

35 35

36 36 Visual browse

37 37

38 38

39 39 Discussion (1) Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability –Use cases (i.e. real users) –X-informatics –Core Informatics –Cyber Informatics There are implications for data models

40 40 Progression after progression ITCyber Infrastru cture Cyber Informatics Core Informatics Science Informatics Science, SBAs Informatics Example: CI = OPeNDAP server running over HTTP/HTTPS Cyberinformatics = Data (product) and service ontologies, triple store Core informatics = Reasoning engine (Pellet), OWL Science (X) informatics = Use cases, science domain terms, concepts in an ontology

41 41 Discussion (2) Data and information science is becoming the ‘fourth’ column (along with theory, experiment and computation) Semantics (of the data) are a very key ingredient -> may imply richer data models

42 Fox RPI: Semantic Data Frameworks May 14, 2008 42 Summary Informatics is playing a key role in filling the gap between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure, i.e. in shifting the burden –This is evident due to the emergence of Xinformatics (world-wide) Our experience is implementing informatics as semantics in Virtual Observatories (as a working paradigm) and Grid environments –VSTO is only one example of success –Data mining, data integration, smart search, provenance are close behind Informatics is a profession and a community activity and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic

43 43 More Information Virtual Solar Terrestrial Observatory (VSTO): http://vsto.hao.ucar.edu, http://www.vsto.org http://vsto.hao.ucar.eduhttp://www.vsto.org Semantically-Enalbed Science Data Integration (SESDI): http://sesdi.hao.ucar.edu http://sesdi.hao.ucar.edu Semantic Provenance Capture in Data Ingest Systems (SPCDIS): http://spcdis.hao.ucar.eduhttp://spcdis.hao.ucar.edu Semantic Knowledge Integration Framework (SKIF/SAM): http://skif.hao.ucar.edu http://skif.hao.ucar.edu Semantic Web for Earth and Environmental Terminology (SWEET): http://sweet.jpl.nasa.govhttp://sweet.jpl.nasa.gov Conferences: AGU 2008, EGU 2009, ISWC 2008, CIKM 2008, … Peter Fox pfox@ucar.edu


Download ppt "1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA."

Similar presentations


Ads by Google