Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project.

Similar presentations


Presentation on theme: "Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project."— Presentation transcript:

1 Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project experience Peter Fox High Altitude Observatory, NCAR Work performed in part with Deborah McGuinness (RPI), Rob Raskin (JPL), Krishna Sinha (VT), Luca Cinquini (NCAR), Patrick West (NCAR), Stephan Zednik (NCAR), Paulo Pinheiro da Silva (UTEP), Li Ding (RPI) and others

2 Fox CI and X-informatics - CSIG 2008, Aug 11 2 Outline Background and inevitabilities Informatics -> e-Science Informatics methodology e.g. Semantic Web as a approach and a technology –Virtual Observatories: use cases, some examples, and non-specialist use –Data ingest, integration, mining and where we are heading Discussion

3 Fox CI and X-informatics - CSIG 2008, Aug 11 3 Background Scientists should be able to access a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

4 Fox CI and X-informatics - CSIG 2008, Aug 11 4 But data has Lots of Audiences From “Why EPO?”, a NASA internal report on science education, 2005 More Strategic Less Strategic Information Information products have SCIENTISTS TOO

5 Fox CI and X-informatics - CSIG 2008, Aug 11 5 Shifting the Burden from the User to the Provider

6 Fox CI and X-informatics - CSIG 2008, Aug 11 6 The Astronomy approach; data- types as a service … … VO App 1 VO App 2 VO App 3 DB 2 DB 3 DB n DB 1  VOTable  Simple Image Access Protocol  Simple Spectrum Access Protocol  Simple Time Access Protocol VO layer Limited interoperability Lightweight semantics Limited meaning, hard coded Limited extensibility Under review Open Geospatial Consortium: Web {Feature, Coverage, Mapping} Service Sensor Web Enablement: Sensor {Observation, Planning, Analysis} Service use the same approach

7 Fox CI and X-informatics - CSIG 2008, Aug 11 7 Mind the Gap! As a result of finding out who is doing what, sharing experience/ expertise, and substantial coordination: There is/ was still a gap between science and the underlying infrastructure and technology that is available Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.  Informatics - information science includes the science of (data and) information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, behavior, and interactions of natural and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual and theoretical foundations. Since computers, individuals and organizations all process information, informatics has computational, cognitive and social aspects, including study of the social impact of information technologies. Wikipedia.

8 Fox CI and X-informatics - CSIG 2008, Aug 11 8 Progression after progression ITCyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, SBAs Informatics

9 Fox CI and X-informatics - CSIG 2008, Aug 11 9 Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part IT, part CI, part Informatics

10 Fox CI and X-informatics - CSIG 2008, Aug 11 10 … … VO Portal Web Serv. VO API DB 2 DB 3 DB n DB 1 Semantic mediation layer - VSTO - low level Semantic mediation layer - mid-upper-level Education, clearinghouses, other services, disciplines, et c. Metadata, schema, data Query, access and use of data Semantic query, hypothesis and inference Semantic interoperability Added value Mediation Layer Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Maps queries to underlying data Generates access requests for metadata, data Allows queries, reasoning, analysis, new hypothesis generation, testing, explanation, et c.

11 Fox CI and X-informatics - CSIG 2008, Aug 11 11 Semantic Web Methodology and Technology Development Process Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Use Case Small Team, mixed skills Analysis Adopt Technology Approach Leverage Technology Infrastructure Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Science/Expert Review & Iteration Develop model/ ontology

12 Fox CI and X-informatics - CSIG 2008, Aug 11 12 Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity. –Extract information from the use-case - encode knowledge –Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere- Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

13 Fox CI and X-informatics - CSIG 2008, Aug 11 13 Inferred plot type and return required axes data

14 Fox CI and X-informatics - CSIG 2008, Aug 11 14 But data has Lots of Audiences From “Why EPO?”, a NASA internal report on science education, 2005 More Strategic Less Strategic

15 Fox CI and X-informatics - CSIG 2008, Aug 11 15 What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge

16 Fox CI and X-informatics - CSIG 2008, Aug 11 16 Teacher receives four groupings of search results: 1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.php and http://www.meted.ucar.edu/hao/aurora/ http://www.meted.ucar.edu/topics_spacewx.php http://www.meted.ucar.edu/hao/aurora/ 2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them 3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean?: Aurora Borealis or Aurora Australis, etc. What should the User Receive?

17 Fox CI and X-informatics - CSIG 2008, Aug 11 17 Semantic Information Integration: Concept map for educational use of science data in a lesson plan

18 Fox CI and X-informatics - CSIG 2008, Aug 11 18

19 Fox CI and X-informatics - CSIG 2008, Aug 11 19 Scaling to large numbers of data providers and redefining the roles/ relations among them Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?) Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …) Crossing discipline boundaries Data quality, preservation, stewardship Security, access to resources, policies Informatics issues for Virtual Observatories

20 20 Provenance Origin or source from which something comes, its intention for use, whom or what it was generated for, the manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery; documented in detail sufficient to allow reproducibility

21 21 Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter? What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? Find all good images on March 21, 2008. Why are the quick look images from March 21, 2008, 1900UT missing? Why does this image look bad? Use cases

22 22

23 23

24 24

25 25 Quick look browse Yasukawa: Computer crash Yasukawa: Rain, cloud

26 26

27 27 Visual browse

28 28

29 29

30 30 Search

31 31

32 32 A Better Way to Access Data The Problem Scientists only use data from a single instrument because it is difficult to access, process, and understand data from multiple instruments. A typical data query might be: “Give me the temperature, pressure, and water vapor from the AIRS instrument from Jan 2005 to Jan 2008” “Search for MLS/Aura Level 2, SO2 Slant Column Density from 2/1/2007” A Solution Using a simple process, SESDI allows data from various sources to be registered in an ontology so that it can be easily accessed and understood. Scientists can use only the ontology components that relate to their data. An SESDI query might look like: “Show all areas in California where sulfur dioxide (SO2) levels were above normal between Jan 2000 and Jan 2007” This query will pull data from all available sources registered in the ontology and allow seamless data fusion. Because the query is measurement related, scientists do not need to understand the details of the instruments and data types.

33 Fox CI and X-informatics - CSIG 2008, Aug 11 33 Determine the statistical signatures of volcanic forcings on the height of the tropopause

34 34 Detection and attribution relations…

35 35

36 36

37 Fox CI and X-informatics - CSIG 2008, Aug 11 37 Leveraged VSTO semantic framework indicating how volcano and atmospheric parameters and databases can immediately be plugged in to the semantic data framework to enable data integration.

38 38 Data Registration Framework Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets by, types, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities Ontology based Data Integration Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Data DiscoveryData Integration A.K.Sinha, Virginia Tech, 2006

39 39 How to find the data? Think about it the way the data providers do

40 SEDRE: Semantically Enabled Data Registration Engine A. K. Sinha, A. Rezgui, Virginia Tech SEDRE: an application that enables scientists to semantically register data sets for optimal querying and semantic integration SEDRE enables mapping of heterogeneous data to concepts in domain ontologies

41 41 Registering Atmospheric Data (2)

42 Fox CI and X-informatics - CSIG 2008, Aug 11 42 Discussion (1) Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability –Use cases –X-informatics –Core Informatics –Cyber Informatics Evolvable technical infrastructure

43 Fox CI and X-informatics - CSIG 2008, Aug 11 43 Progression after progression ITCyber Infrastruc ture Cyber Informatics Core Informatics Science Informatics Science, Societal Benefit Areas, Edu Informatics One example: CI = OPeNDAP server running over HTTP/HTTPS Cyberinformatics = Data (product) and service ontologies, triple store Core informatics = Reasoning engine (Pellet), OWL, CMAP, Science (X) informatics = Use cases, science domain terms, concepts in an ontology

44 Fox CI and X-informatics - CSIG 2008, Aug 11 44 Discussion (2) The data and information challenges are (almost) being identified as increasingly common Data and information science is becoming the ‘fourth’ column (along with theory, experiment and computation) Semantics are a very key ingredient for progress in informatics A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production

45 Fox CI and X-informatics - CSIG 2008, Aug 11 45 Summary Informatics is playing a key role in filling the gap between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure –This is evident due to the emergence of Xinformatics (world-wide) Our experience is implementing informatics as semantics in Virtual Observatories (as a working paradigm) and Grid environments –VSTO is only one example of success –Data mining, data integration, smart search, provenance Informatics is a profession and a community activity and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic

46 Fox CI and X-informatics - CSIG 2008, Aug 11 46 More Information Virtual Solar Terrestrial Observatory (VSTO): http://vsto.hao.ucar.edu, http://www.vsto.org http://vsto.hao.ucar.eduhttp://www.vsto.org Semantically-Enalbed Science Data Integration (SESDI): http://sesdi.hao.ucar.edu http://sesdi.hao.ucar.edu Semantic Provenance Capture in Data Ingest Systems (SPCDIS): http://spcdis.hao.ucar.eduhttp://spcdis.hao.ucar.edu SAM/Semantic Knowledge Integration Framework (SKIF): http://skif.hao.ucar.eduhttp://skif.hao.ucar.edu Conferences: numerous Journals: Earth Science Informatics Texts:, a few are in progress Courses: –Semantic e-Science, fall 2008 course at RPI –Geoinformatics, at Purdue Contact: Peter Fox pfox@ucar.edu

47 Fox CI and X-informatics - CSIG 2008, Aug 11 47 Spare room

48 Fox CI and X-informatics - CSIG 2008, Aug 11 48 Translating the Use-Case - non- monotonic? Input Physical properties: State of neutral atmosphere Spatial: Above 100km Toward arctic circle (above 45N) Conditions: High geomagnetic activity Action: Return Data Specification needed for query to CEDARWEB Instrument Parameter(s) Operating Mode Observatory Date/time Return-type: data GeoMagneticActivity has ProxyRepresentation GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere) Kp is a GeophysicalIndex hasTemporalDomain: “daily” hasHighThreshold: xsd_number = 8 Date/time when KP => 8

49 Fox CI and X-informatics - CSIG 2008, Aug 11 49 VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.orgvsto.hao.ucar.eduwww.vsto.org Web Service

50 Fox CI and X-informatics - CSIG 2008, Aug 11 50 Partial exposure of Instrument class hierarchy - users seem to LIKE THIS Semantic filtering by domain or instrument hierarchy

51 Fox CI and X-informatics - CSIG 2008, Aug 11 51

52 Fox CI and X-informatics - CSIG 2008, Aug 11 52 Semantic Web Services

53 Fox CI and X-informatics - CSIG 2008, Aug 11 53 Semantic Web Services OWL document returned using VSTO ontology - can be used both syntactically or semantically

54 Fox CI and X-informatics - CSIG 2008, Aug 11 54 Semantic Web Services

55 Fox CI and X-informatics - CSIG 2008, Aug 11 55 Semantic Web Services

56 Fox CI and X-informatics - CSIG 2008, Aug 11 56 VSTO achievements Conceptual model and architecture developed by combined team; KR experts, domain experts, and software engineers Semantic framework developed and built with a small, cohesive, carefully chosen team in a relatively short time (deployments in 1st year) Production portal released, includes security, et c. with community migration (and so far endorsement) VSTO ontology version 1.2, (vsto.owl) in production, 2.0 in preparation Web Services encapsulation of semantic interfaces in use Solar Terrestrial use-cases are driving the completion of the ontologies (e.g. instruments) Using ontologies and the overall framework in other applications (volcanoes, climate, oceans, water, …)

57 Fox CI and X-informatics - CSIG 2008, Aug 11 57 Semantic Web Basics The triple: {subject-predicate-object} Interferometer is-a optical instrument Optical instrument has focal length An ontology is a representation of this knowledge W3C is the primary (but not sole) governing organization for languages, specifications, best practices, et c. –RDF - Resource Description Framework –OWL 1.0 - Ontology Web Language (OWL 1.1 on the way) Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon Put semantics between/ in your interfaces, i.e. between layers and components in your architecture, i.e. between ‘users’ and ‘information’ to mediate the exchange

58 Fox CI and X-informatics - CSIG 2008, Aug 11 58 Semantic Web Benefits Unified/ abstracted query workflow: Parameters, Instruments, Date-Time Decreased input requirements for query: in one case reducing the number of selections from eight to three Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services –understanding of coordinate systems, relationships, data synthesis, transformations, et c. –returns independent variables and related parameters A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

59 59 Example 1: Registration of Volcanic Data SO 2 Emission from Kilauea east rift zone - vehicle-based (Source: HVO) Abreviations: t/d=metric tonne (1000 kg)/day, SD=standard deviation, WS=wind speed, WD=wind direction east of true north, N=number of traverses Location Codes: U - Above the 180° turn at Holei Pali (upper Chain of Craters Road) L - Below Holei Pali (lower Chain of Craters Road) UL - Individual traverses were made both above and below the 180° turn at Holei Pali H - Highway 11

60 60 Registering Volcanic Data (1)

61 61 Registering Volcanic Data (2) No explicit lat/long data Volcano identified by name Volcano ontology framework will link name to location

62 62 Example 2: Registration of Atmospheric Data Satellite data for SO 2 emissions Abbreviation: SCD: Slant Column Density (in Dobson Unit (DU))

63 63 Registering Atmospheric Data (1)

64 Fox CI and X-informatics - CSIG 2008, Aug 11 64 SAM Project Objectives S. Graves, R. Ramachandran To create a prototype Semantic Analysis and Mining framework (SAM) comprising: –Data mining and knowledge extraction web services –Linked ontologies describing the mining services, data and the problem domain –Web-based client To allow users to discover and explore existing data and services, compose workflows for mining and invoke these workflows. –Semantic search –Automated web service invocation –Automated web service composition

65 Fox CI and X-informatics - CSIG 2008, Aug 11 65 Data Mining Ontology: Design Courtesy: R. Ramachandran

66 Fox CI and X-informatics - CSIG 2008, Aug 11 66 Data Mining Ontology: Snapshot Courtesy: R. Ramachandran

67 Fox CI and X-informatics - CSIG 2008, Aug 11 67 The Information Era: Interoperability managing and accessing large data sets higher space/time resolution capabilities rapid response requirements data assimilation into models crossing disciplinary boundaries. Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:

68 Fox CI and X-informatics - CSIG 2008, Aug 11 68 Virtual Observatories Conceptual examples: In-situ: Virtual measurements –Related measurements Remote sensing: Virtual, integrative measurements –Data integration Managing virtual data products/ sets

69 Fox CI and X-informatics - CSIG 2008, Aug 11 69 Virtual Solar Terrestrial Observatory A distributed, scalable education and research environment for searching, integrating, and analyzing observational, experimental, and model databases. Subject matter covers the fields of solar, solar-terrestrial and space physics Provides virtual access to specific data, model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use 3 year NSF-funded (OCI/SCI) project - completed Several follow-on projects

70 70 Problem definition Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision We often fail to capture, represent and propagate manually generated information that need to go with the data flows Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects The task of event determination and feature classification is onerous and we don't do it until after we get the data

71 71 Building blocks Data formats and metadata: IAU standard FITS, with SoHO keyword convention, JPeG, GIF Ontologies: OWL-DL and RDF The proof markup language (PML) provides an interlingua for capturing the information agents need to understand results and to justify why they should believe the results. The Inference Web toolkit provides a suite of tools for manipulating, presenting, summarizing, analyzing, and searching PML in efforts to provide a set of tools that will let end users understand information and its derivation, thereby facilitating trust in and reuse of information. Capturing semantics of data quality, event, and feature detection within a suitable community ontology packages (SWEET, VSTO)


Download ppt "Fox CI and X-informatics - CSIG 2008, Aug 11 1 Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project."

Similar presentations


Ads by Google