What is infrastructure (and who cares)? Clearinghouse / Registry – the tracks Portals – the terminals Applications – the trains Users – the passengers Content – the baggage SBAs – the destinations
Without metadata, SOA itself would be impossible Datasets Service Instances Community Catalogs Provisions Clearinghouse Harvests / Cascades Service / Dataset Description Metadata Get Capabilities ? ? ? Service Instances Datasets Community Catalogs
Annex B Use Case Disaster Scenario –Discover which data, products and services are available for this area of interest and this thematic. Connect to the FedEO networks. –After having identified some interesting products for [user] needs, perform a catalogue search of these products to have a quick look and more information. –Order the previously identified products directly to the providers.
Transverse Use Cases 1.Register organization 2.Define & publish component with services 3.Develop component metadata system 4.Register component & services 5.Search for components 6.Search community catalogs / metadata services 7.Bind client to service 8.Access services for application 9.Evaluate component quality / usability 10.Subscribe to alerts 11.Construct & publish workflow 12.Register standards & best practices
Define & Publish Component with Services 1.Precondition: Organization registration 2.Select component type to represent contribution 3.Select service types which represent component access 4.Expose service endpoints 5.Postcondition: Services expose component
Develop Component Metadata System 1.Precondition: Develop component and services 2.Choose component metalevels 3.Choose metadata standards and formats 4.Choose publishing method (e.g. community catalog) 5.Construct metadata document(s) 6.Deploy / publish metadata documents 7.Postcondition: Descriptions of components and services are available
Register component and services 1.Precondition: component and services are exposed and described adequately by metadata 2.Enter information into Registry application for component 3.Enter information into Registry applications for associated services 4.Postcondition: component and services are registered
Search for components and services (data first) 1.Precondition: appropriate components are registered in the Registry and are searchable through a Clearinghouse. 2.User opens a Clearinghouse client, e.g. Geo Portal 3.User enters one or more target values for queryable parameters and initiates search. 4.Alternative: user browses candidate components presented by the Geo Portal 5.Alternative: user issues search by way of a semantic mediator which expands / maps the search terms for finding components and services in disparate domains. 6.User refines search parameters to refine search 7.User searches / browses services associated to candidate components 8.Postcondition: set of candidate components with suitable service interfaces for drill down.
Search community catalogs 1.Precondition: community catalog is registered in Registry and findable through Clearinghouse and/or otherwise known to user. 2.User opens client application (e.g. community portal) with capabilities to access community catalog. 3.User searches community catalog for holdings of interest 4.Alternative: user searches Clearinghouse for holdings of community catalogs which have been harvested by or federated with the Clearinghouse. 5.Alternative: user finds candidate community catalog summary records through Clearinghouse, then drills down into more detailed metadata in the originating community catalog 6.Postcondition: user has identified resources of interest, both data products, and services which make them available via the Web or other communication medium.
Evaluate component quality / usability Precondition: user has identified a component (e.g. dataset) relevant to their project needs and accessed it through appropriate services. User develops decision support workflow to derive analysis / visualization of dataset(s). User obtains metadata descriptions of dataset(s) sufficient to determine validity of derived result (e.g. statistical power of a decision). Optional: user contacts provider of dataset to obtain additional metadata elements needed for evaluation Optional: user iterates through workflow and evaluation to optimize decision validity. Postcondition: user has determined the significance of an observation-based decision.
Registry Providers Organization / Component / Service Registry – GMU –ebRIM based, but user interface is limited to O / C / S –Service interface is discovery only Standard and Special Arrangements Registry – IEEE –Coordination with Service registry being developed User Requirements Registry – IEEE –Role is loosely defined Best Practices Wiki - IEEE –Not an authoritative register
Component Registration Observing System or Sensor Network Exchange and Dissemination System Modeling and Data Processing Center Data set or Database Catalog, Registry, Metadata Collection Portal or website Software or application Computational model Initiative or Programme Information feed, RSS, or alert Training or educational resources Web-accessible document, file, or graphic Other, enter information in box:
Registry Status Component Types GetRecordbyID Standards Registry coordination Content cleaning, testing, status Work on attributes for harvest-able metadata and other links
Resource Discovery / Summary Needs Datasets –Data type / feature type –Observable(s) –Coverage in space and time –Origin / authority –Quality / usage Services –Service type –Accessed content / data –Functionality / operations / options –Bindings –Quality / availability Catalogs Record types Holdings / collections Supported interfaces Queryable properties Response types / formats Tags / categories / relations Portals / applications Functionality Client interfaces Supported workflow Intended users Technology platform
Resource Description Relationships Dataset Description Service Description Collection Description Product Description Catalog Description Application Description Workflow Description Provision Operatio n Provision Operatio n Operates on Provided by Derivative Description
Service – data model (Cat 2.0.2 ISO Profile) Includes extensions to 19119. Not ingest-able from OGC Capabilities without constrained MetadataURL provision Related to but not the same as Inspire metadata profile Question whether this is sufficient / needed for discovery
Community Catalog / Component Providers JAXA EO Catalog CNES / Erdas Catalog FedEO Community Clearinghouse ICAN Coastal Atlas / Mediator WUSTL / ESIP AQ Catalog NOAA SNAAP IP3 Mediator NOAA WAF
Clearinghouses Clearinghouse Status – Archie (FGDC) –Simple Clearinghouse" harvests component and service records from the registry. Working on the ingest procedure from e.g. Z39.50 community catalogs and moving on to CS/W catalogs, FGDC records from LandSat. –Lots of non-functioning endpoints still in the service registry, but they are slowly being cleaned out and/or set up for testing. –Not yet harvesting registered services except for catalogs, and the service registry, but have experimented with some WMS instances. –Doesn't yet implement a CS/W Discovery interface, but working on it. Clearinghouse Status – Marten (ESRI) –Still some issues with harvesting the service registry - complaints about CS/W capabilities not being valid. Need to sync up with Yuqi and Archie to resolve this. –Able to harvest Ted Habermann's WAF records. Some metadata validity issues worked through, but how lenient should the ingest be? At what point does lenience interfere with "findability"? (Josh) –Also harvested the Renewable Energy registered services and Biodiversity site. –GeoGratis Z39.50 interface is still problematic (e.g. AVHRR imagery). –Last week at GEO meeting there were some folks from CEOS (EROS) with lots of medium-resolution imagery which is only searchable through Globus. Clearinghouse Status – Robert (Compusult) –Ingestion from Registry being run every few days. No notification yet if harvesting fails.
Clearinghouse Distributed search vs. Harvest Harvest alternative advantage: quick searches. Disadvantage: metadata duplication and scale of processing for large catalogs / archives Distributed Search advantage - metadata is maintained closer to source. Disadvantage is that searching takes longer to complete and has more chances for the search to not be completed. Recommend Harvest when possible –Harvest only collection metadata at appropriate scale –Policy of community catalogue must be respected
Integration Issues Catalogues registered with GEOSS have a wide variety of standardization. Protocols include: –ISO23950 (Z39.50) GEO Profile Version 2.2 FGDC (CSDGM Metadata) ANZLIC Metadata ISO 19115 Metadata –OGC Catalogue Service for the Web (Version 2.0.1 and 2.0.2) ebRIM Profile (incl ISO and EO Extension Packages) FGDC Profile ISO 19115 Profile –SRU/SRW / OpenSearch –OAI-Protocol for Metadata Harvesting (OAI-PMH) –Dublin/Darwin Core Metadata –Web-accessible folder/ftp?
Mediation Issues - TBD Where should mediation occur and when? What and how many controlled vocabularies, taxonomies, ontologies? –Organizations, SBAs, CoPs –Top-down Bottom-up Free-for-all How to manage and leverage mappings? Is there a role for knowledgebase inference?
Workplan Elements for Catalogue / Clearinghouse Thread Persistence, completeness, findability More resources and resource types, e.g. applications, workflows Minimum interoperability measures, e.g. geoss:Record Best practices for federated harvest and query User requirements refinement and added registry / clearinghouse value Controlled vocabularies, mediation resources, cross-community enablement On-going role for search and discovery in scenarios and decision support applications Facilitation of usable OpenSearch / GeoSearch entry points to the Clearinghouse Role for publish-subscribe-notify interaction style in Clearinghouse