Presentation on theme: "SAN DIEGO SUPERCOMPUTER CENTER HYDROLOGIC METADATA CATALOG AND SEMANTIC SEARCH SERVICES IN CUAHSI HIS CUAHSI HIS Sharing hydrologic."— Presentation transcript:
SAN DIEGO SUPERCOMPUTER CENTER HYDROLOGIC METADATA CATALOG AND SEMANTIC SEARCH SERVICES IN CUAHSI HIS CUAHSI HIS Sharing hydrologic data Thomas Whitenack David Valentine, Ilya Zaslavsky, Michael Piasecki, David G. Tarboton, Jeffery S. Horsburgh, Timothy Whiteaker, Daniel Ames, David R. Maidment
SAN DIEGO SUPERCOMPUTER CENTER CUAHSI HIS The CUAHSI Hydrologic Information System (HIS) is an internet based system to support the sharing of hydrologic data. It is comprised of hydrologic databases and servers connected through web services as well as software for data publication, discovery and access. Data Discovery and Integration platform Data Publication platform Data Synthesis and Research platform Data Services Metadata Services Metadata Search HIS Central HydroDesktopHydroServer Service registration Catalog harvesting Service and data theme metadata Data carts Water Data Services Spatial Data Services Like search portals Google, Yahoo, Bing
SAN DIEGO SUPERCOMPUTER CENTER What is the Hydrologic Metadata Catalog? Database for the HIS Central registry and Search Services. Stores Site, Variable, and Series information, plus general metadata for each registered service. Data Values are not in the Catalog. Purpose is to provide ability to search across federated services to provide information which lead client applications to data values.
SAN DIEGO SUPERCOMPUTER CENTER HIS Central HIS Central is a web application where you can register Water Data Services into the Hydrologic Metadata Catalog.
SAN DIEGO SUPERCOMPUTER CENTER Registering a WaterML Service at HIS Central
SAN DIEGO SUPERCOMPUTER CENTER Hydrologic Metadata Catalog Harvesting Each registered water data service is harvested using the standard Water Data Service methods: ◦ GetSites Returns list of each site record for the available from the service ◦ GetSiteInfo For each site this request is made. Returns All variables monitored at the site Period of record for each variable The Number of values available
SAN DIEGO SUPERCOMPUTER CENTER Hydrologic Metadata Catalog Core Data Schema
SAN DIEGO SUPERCOMPUTER CENTER Ontology Keyword Hierarchy used to categorize and assist in the discovery of monitored variables. Each Variable is “tagged” to a keyword concept.
SAN DIEGO SUPERCOMPUTER CENTER Storing the Ontology in the database Concepts HierarchyConceptPaths
SAN DIEGO SUPERCOMPUTER CENTER Ontology Service Methods getSearchableTerms Simply returns a list of all searchable Keyword Concepts. Searchable concepts include “branch” concepts as well as “Leaf” concepts. Higher level branches are not included as they are too broad. getOntologyTree By passing in a “Branch” concept, it returns the ontology terms below it in a tree structure. (Passing “HydroSphere” returns then entire ontology). getWordList Passing a substring, such as “temp” returns all keywords which contain that sequence of characters. This is intended as an usibility feature for the client applications.
SAN DIEGO SUPERCOMPUTER CENTER Search Service Methods (1/3) GetWaterOneFlowServiceInfo ◦ Returns a list of all the services with which are registered with HIS Central. GetServicesInBox ◦ Same as GetWaterOneFlowServiceInfo method, but restricted by geographic envelope. These methods both return the following information: WSDL endpoint for Water Data service, title, name, organization, contact info, estimated number of values, number of sites, number of variables, and geographic extent.
SAN DIEGO SUPERCOMPUTER CENTER Search Service Methods (2/3) GetSitesInBox Requires Geographic extent (box) Concept Keyword (can be empty) NetworkIDs (used to restrict returned values, can be empty). Returns information necessary to display sites on a map and request more information about series.
SAN DIEGO SUPERCOMPUTER CENTER Search Service Methods (3/3) GetSeriesCatalogForBox The primary method for searching the catalog. Returns series record information. Client application uses this information to request the data values from the registered service. You provide: Geographic extent (box) Temporal extent (begin/end dates) Concept Keyword (can be empty) NetworkIDs (used to restrict returned values, can be empty).
SAN DIEGO SUPERCOMPUTER CENTER What info is in a Series Record you ask? Everything required to create a datacart. SeriesRecord ◦ ServCode - (string) services unique code – “nwis” ◦ ServURL – (string) wsdl address of service ◦ Location – (string) site code ◦ VarCode- (string) variable code associated with the series ◦ Varname –(string) variable name ◦ beginDate – (string) start date of series ◦ endDate – (string) end date of series (as of last harvest). ◦ Authtoken – (string) unimplemented ◦ ValueCount – (int) number of values in series ◦ Sitename –(string) site name ◦ Latitude –(double) ◦ Longitude – (double) ◦ datatype –(string) ◦ valuetype –(string) ◦ samplemedium –(string) ◦ timeunits –(string) ◦ conceptKeyword –(string) Ontology keyword to which this variable is tagged ◦ genCategory –(string) ◦ TimeSupport –(string)
SAN DIEGO SUPERCOMPUTER CENTER Hydrologic Metadata Catalog Stats Services Variables Sites Series Values referenced: 47 4,812 1,889,199 8,516,440 4,622,778,988
SAN DIEGO SUPERCOMPUTER CENTER Future Development Need to standardize the services to use WaterML data exchange format. Need to Harvest data directly from HydroServer capabilities services. Need to extend the search to allow for other geometries to search by, besides envelope. (HUCs, counties, etc).
SAN DIEGO SUPERCOMPUTER CENTER Conclusions Searching across multiple, federated services is made possible by harvesting and indexing metadata from registered services. Metadata is data. The catalog pushes the limits of what is metadata
SAN DIEGO SUPERCOMPUTER CENTER Questions? central.asmxhttp://hiscentral.cuahsi.org/webservices/his central.asmx