Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox June 4, 2010 – CSIRO Aspendale.

Similar presentations


Presentation on theme: "Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox June 4, 2010 – CSIRO Aspendale."— Presentation transcript:

1 Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

2 Introduction Systems compared to frameworks? The need, and shifting the burden Virtual Observatories Architectures of VOs and semantics In the lower layers of VOs – Data access and transport – Formats, formats, formats – Sensor streams How do you/ would you participate? 2 Tetherless World Constellation

3 Frameworks vs. Systems Rough definitions – Systems have very well-define entry and exit points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering – Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design Treat this as a working definition Tetherless World Constellation 3

4 Diversity, Integration, Size, … Not just large (well organized, long-lived, well-funded) projects/ programs want to make their data available Data policies are emerging but are still highly variable (or non-existent) – How does a user deal with this? Need to manage data to solve challenging scientific or societal problems without the continued need for a scientist to know every detail of complex data management systems Large-scale, scientific data repositories: – Most data still created in a manner to simplify generation, not access or use – Very diverse organization of data; files, directories, metadata, emails, etc. – Source/origin management is driven by meta-mechanisms for integration, interoperability (but still need performance) Virtual Observatories Data Grids Increasing realization: need management for all forms of ‘data’, I.e. virtual data products are becoming the norm Size matters; personal data management is as big, or bigger problem as source data management

5 Shifting the Burden from the User to the Provider (with the help of VOs)

6 6 Terminology Workshop: A Virtual Observatory (VO) is a suite of software applications on a set of computers that allows users to uniformly find, access, and use resources (data, software, document, and image products and services using these) from a collection of distributed product repositories and service providers. A VO is a service that unites services and/or multiple repositories. VxOs - x is one discipline, domain, community, country NB: VO also refers to Virtual Organization

7 7 What should a VO do? Make “standard” scientific research much more efficient. – Even the principal investigator (PI) teams should want to use them. – Must improve on existing services (mission and PI sites, etc.). VOs will not replace these, but will use them in new ways. Enable new, global problems to be solved. – Rapidly gain integrated views from the solar origin to the terrestrial effects of an event. – Find data related to any particular observation. – (Ultimately) answer “higher-order” queries such as “Show me the data from cases where a large coronal mass ejection observed by the Solar-Orbiting Heliospheric Observatory was also observed in situ.” (science-speak) or “What happens when the Sun disrupts the Earth’s environment” (general public)

8 8 Virtual Observatories Conceptual examples: In-situ: Virtual measurements – Related measurements Remote sensing: Virtual, integrative measurements – Data integration Both usage patterns lead to additional data management challenges at the source and for users; now managing virtual ‘datasets’

9 9 Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage

10 10 Early days of VxOs … … VO 1 VO 2 VO 3 DB 2 DB 3 DB n DB 1 ?

11 11 Federation … … VO 1 VO 2 VO 3 DB 2 DB 3 DB n DB 1 VO 4

12 12 The Astronomy approach; data-types as a service … … VO App 1 VO App 2 VO App 3 DB 2 DB 3 DB n DB 1  VOTable  Simple Image Access Protocol  Simple Spectrum Access Protocol  Simple Time Access Protocol VO layer Limited interoperability Lightweight semantics Limited meaning, hard coded Limited extensibility Under review OGC: {WFS, WCS, WMS} and SWE {SOS, SPS, SAS} use the same approach

13 Similarities to Astronomy Some disciplines have chosen a data format (some even use FITS) Common applications, community standards appearing Images, spectra (incl. multi-band), … More and more data is on-line, some (near) real-time Data flood - synoptic measurements, spatial/ spectral resolution, number of instruments, cadence - all increasing (peta-byte to exa- byte is real), data mining and knowledge extraction are now real needs Don’t move (or replicate?) the data when possible Means for interoperation is being demanded - service-oriented architectures Some VOs even implementing IVoA standards (primarily heliophysics and space physics)

14 Differences with astronomy Data types (+station/point, irregular, multi-resolution, ragged arrays, swath, …) Data formats - many Lots of VOs Metadata conventions range from strict to non-existent Provenance, derivation and semantics being applied in (more) formal ways Geo-spatial dominates (cf helio-spatial), some standards but little/no enforcement - efforts at conventions/ standards are at data model level New to the theme of integration and inter-disciplinary Number and complexity of projects, systems, frameworks - need to interoperate at many levels Social, political and mission forces are immense

15 Fox - APAC 2007, Driving e-research: Grids and Semantics 15 … … VO Portal Web Serv. VO API DB 2 DB 3 DB n DB 1 Semantic mediation layer - VSTO - low level Semantic mediation layer - mid-upper-level Education, clearinghouses, other services, disciplines, etc. Metadata, schema, data Query, access and use of data Semantic query, hypothesis and inference Semantic interoperability Added value Mediation Layer Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Maps queries to underlying data Generates access requests for metadata, data Allows queries, reasoning, analysis, new hypothesis generation, testing, explanation, etc.

16 16 Semantic Web Benefits Unified/ abstracted query workflow: Parameters, Instruments, Date-Time Decreased input requirements for query: in one case reducing the number of selections from eight to three Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services – understanding of coordinate systems, relationships, data synthesis, transformations. – returns independent variables and related parameters A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

17 Virtual Carbon Observatory Tetherless World Constellation 17

18 Environmental Assessment

19 Understand Communities Of Stakeholders

20 Tetherless World Constellation 20

21 Multi-domain Knowledge Base Provenance Science Data Processing Science 21

22 Vocabularies and Ontologies An underlying aspect of all VOs is the need to develop/ agree on a common presentation of the (virtual) holdings, aka a catalog As disciplines boundaries are crossed… (ecology) Vocabularies are increasingly important in this provision And, interestingly, there is a real push toward more explicit representations of semantics in the form of ontologies … and provision of vocabulary services* Tetherless World Constellation 22

23 Let’s turn to plumbing Data formats are of resurgent interest but not so much for exchange – For structural representation and efficiency – For transparency and preservation – However, a lot of end-users still care about formats immensely Data access and transport Implications of computing closer to the data Tetherless World Constellation 23

24 netCDF and similar Version 3 (classic) vs. version 4 (aka CDM) V4 - slow adoption to date (no specific reason) Conventions (e.g. units, CF-1) make it work Traditional focus on grids is now evolving as in-situ data and model comparisons are becoming common, i.e. unstructured data Tetherless World Constellation 24

25 Discipline neutral access One such approach, since 1993, is the DAP – Data Access Protocol (NASA, NOAA standard) opendap.org (U.S. not-for-profit) OPeNDAP is the software – Core, server (version 4 – Hyrax), client, services Tetherless World Constellation 25

26 26 OPeNDAP Hyrax Architecture OLFSBES  OPeNDAP Lightweight Front end Server (OLFS)  Receives requests and asks the BES to fill them  Uses Java Servlets  Does not directly ‘touch’ data  Multi-protocol Data  Back End Server (BES)  Reads data files, Databases, et c., returns info  May return DAP2 objects or other data  Does not require web server Client

27 27 GridFTP DAP2 GridFTP DAP2 HTTP DAP2 HTTP DAP2 ASCII output HTML form Info output OPeNDAP Lightweight Front end Server THREDDS Request Formulation** Request from client Response to client BES SOAP-DAP (HTTP) DAP2 (GridFTP, HTTP) RDF, OWL, JSON (HTTP) PML output

28 28 Hyrax/ Back-end Server Network Protocol and Process start/stop activities Data Store Interfaces BES Framework PPT* Initialization/ Termination DAP2 Access NetCDF3HDF4RDF/ SPARQL … Provenance Commands** BES Commands/ XML Documents *PPT is built in (other protocols) **Some commands are built in Data Catalogs

29 Status of the Community OPeNDAP Server Software Hyrax 1.6 provides support for NcML-based aggregation Faster THREDDS implementation (but not full featured) Full security audit and static code analysis certification to comply with NOAA and NASA requirements DAP4 (which includes netCDF 4 support) is not available yet AND other things

30 Earth System Grid Center for Enabling Technologies: (ESG-CET) Earth System Grid Center for Enabling Technologies Large data sets, numbers and sizes – High performance – Flexible architecture, both client and several types and numbers of servers – Aggregation – Server side operations – Multiple transport protocol options Full ESG security support as well as loose federation Full function client access via API (netCDF/CDM) To satisfy the new goals, the OPeNDAP services for ESG have been re- architected. We now use parts of the standard OPeNDAP framework Hyrax, focusing on high performance for the client side and extended flexibility.

31 Earth System Grid Center for Enabling Technologies: (ESG-CET) Requirements leading to OPeNDAP-g Separation of the core Data Access Protocol (DAP) from the transport protocol (HTTP). High Performance Computing. The previous CGI based servers did not have the capacity required by ESG. Error and memory handling added. Security. Once the OPeNDAP was independent of the transport protocol, adding security was possible by relying on the Globus gsiFTP system. Aggregation. OPeNDAP 3.0 did not operate on aggregated datasets. OPeNDAP-g does. Transport protocol independence and HPC were incorporated back into OPeNDAP leading to the current version. Security and aggregation initially were ESG only features.

32 Earth System Grid Center for Enabling Technologies: (ESG-CET) The Remote NetCDF Invocation (RNI)  The client is the netCDF library. It has exactly the same API as the standard C library netCDF, but it can deal with local files or files reachable via HTTP, PPT or gridFTP.  The third tier, the BES server can be reached only via PPT. NetCDF services for all NetCDF calls are implemented a a BES module.  The middle tier, acts like a proxy between the RNI client and server and deals with security.

33 Earth System Grid Center for Enabling Technologies: (ESG-CET) RNI Architecture CLIENT DATA GridFTP OPeNDAP BES NetCDF Library RNI Module connection acts like RNI Library

34 Earth System Grid Center for Enabling Technologies: (ESG-CET) Characteristics of the RNI as part of a data access system Full Support of standard OPeNDAP URLs. RNI is being developed with the integrated Unidata/OPeNDAP netCDF library (and CDM) Transparent access to either standard netCDF files and aggregated datasets via the NetCDF Markup Language (NCML). For remote containers, all write operations are disable for security. That is, for HTTP/HTTPS, PPT and gridFTP/gsiFTP the RNI system is a read only API. RNI utilizes Just in Time access. Caching is only for metadata. No pre-fetching of data. RNI transparently accesses secure (gsiFTP, HTTPS) or insecure (gridFTP, HTTP) remote data.

35 Other DAP client/ API library status OPeNDAP-Unidata project to fold ‘libnc-dap’ into the standard netCDF distribution, i.e. you get ‘DAP’ for free New C-API for DAP – ‘oc’ replaces ocapi and will be the basis for rewrites of the IDL and Matlab (and other) client interfaces Earth System Grid Center for Enabling Technologies: (ESG- CET)

36 NOAA/IOOS DAP adopted by DMAC Gateway project for OPeNDAP – Support for WCS/WFS as source and response type in Hyrax – Implementation of AIS (Ancillary Information Service) for RDF return prototype – Initial DAP ontology data model Tetherless World Constellation 36

37 Cloud Microsoft ported OPeNDAP Hyrax to their Azure cloud – http://opendap.cloudapp.net/dap http://opendap.cloudapp.net/dap – Web-client/form is at http://opendap.cloudapp.net/dap/data/nc/conte nts.html http://opendap.cloudapp.net/dap/data/nc/conte nts.html Work on Azure Drive (Xdrive) underway No decisions on future or other cloud environments Tetherless World Constellation 37

38 Security (authn/z) Developed with Bryan Lawrence (BADC/STFC) for federation of OPeNDAP security Specd. In May 2009, implementations presented at EGU in 2010 Will appear in ESG and community OPeNDAP releases AAF compatible? Tetherless World Constellation 38

39 Sensors Due to the increasing demand to process off the sensor: – Sky surveys – volume – Monitoring – for rapid response and decision support – As part of a network, or on the internet, a web There is a corresponding increase in need to ingest/ publish data much earlier than has previously been needed Trend toward treating them as RT/NRT sensors Tetherless World Constellation 39

40 Directions for sensor and spatial standards (my view) Has grown out of a limited set of semantic constructs – Geography, features, coverages, maps, streams Integration needs are driving different (good) developments, e.g. WCS 2 v WFS 2. Transparency requirements are going to drive very different approaches, e.g. encapsulation can be a barrier Refactoring of standards: much as is happening in astronomy will be required Tetherless World Constellation 40

41 Who is developing? Your participation? VOs – U.S. – NASA, NSF, NOAA are developing/ funding – EU – many, e.g. HELIO, SOTERIA DAP/OPeNDAP – World-wide community, strong Australian contributions/ use Sensors – W3 recent – incubator for semantic sensor web – very, very important work Vocabulary servers (more than the vocabularies) – Interest in community-based (or W3) effort Tetherless World Constellation 41

42 Scaling to large numbers of data providers Security, policy enforcement Data quality Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?) Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …) Sustainability Issues for Virtual Observatories - Geo

43 Summary/ Discussion The VO paradigm in is wide-spread use in Earth and Space Sciences – Successful implementations in production and use (some even have evaluations) – New science is being enabled and performed – There are active programs at the agency level – Active communities; meeting, publishing, developing, implementing Data access and transport is an active field New attention to spatio-temporal standards and vocabularies in the context of services Substantial re-visiting of architectures due to the need to accommodate explicit semantics (esp. in regard to sensors)

44 Further Information http://tw.rpi.edu/ http://www.opendap.org and http://docs.opendap.org http://www.opendap.org http://docs.opendap.org Lots of others (ask me) Contact: – pfox@cs.rpi.edu pfox@cs.rpi.edu Tetherless World Constellation 44


Download ppt "Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox June 4, 2010 – CSIRO Aspendale."

Similar presentations


Ads by Google