Presentation on theme: "The Semantic Web for Spatial Data Search Femke Reitsma University of Maryland – College Park"— Presentation transcript:
The Semantic Web for Spatial Data Search Femke Reitsma University of Maryland – College Park
Why Ontologies: Could the Semantic Web Meet Discovery Challenges? The semantic web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation ( Tim Burners-Lee et al. 2001) The semantic web makes web pages machine understandable rather than just human understandable.
Semantic web components: 2. Moving up the semantic web layers: 1. Basic components: Semantic Web layers presented by Tim Berners-Lee ontology + semantically marked-up web page = semantic web We are here
Semantic Web Languages Primary languages: RDF (Resource Description Framework) RDFS (Resource Description Framework Schema) OWL (Web Ontology Language) Historical development: XML provides the basic syntax RDF and RDFS adds some tags to XML DAML+OIL add some tags to RDF OWL extends and replaces (almost) DAML+OIL
Basic Structure Information is encoded as a triple: subject, predicate, and object For example: All subjects and objects are identified with a Universal Resource Identifier (URI): e.g.
What is an ontology? Big O Ontology vs little o ontology: Ontology = metaphysics, the essence of being, reality ontology = a logical theory which gives an explicit, partial account of a conceptualization (Guarino and Giaretta, 1995 )
ontology + semantic web = Computer parsable Inference ability State code > city code > address code Computer agent could deduce that a Cornell University address, being in Ithaca, must be in New York State, which is in the U.S., and therefore must be formatted to U.S. standards.
Example of application: Find me places to eat accessible via public transport? Over here ….
Objective: Explore the potential of the Semantic Web for distributing spatial data
Current GCMD Search North America? 2950 records matched your query
North America? North America  Limit search by: - Spatial resolution - Temporal resolution - GCMD keywords Explore results by: Canada USA  GCMD keywords …… Key = ability to determine relationships between keywords without explicitly encoding them Future GCMD Search
North America? GCMD Database Sesame Ontology Java Application Progressing Towards Level 1: Sesame = Open Source RDF Schema-based Repository and Querying facility
Keywords Ontologies Importance of careful specification of relationships for ontology. CATEGORY > TOPIC > TERM > VARIABLE For purpose of Semantic Web, keyword structure may need modification. e. g. Hydrosphere > Ground Water > Saltwater Intrusion e.g. the Variable Fetch is a measurable property of the Term Ocean Waves; however, the Variable Fisheries is a sub-topic of the Term Agricultural Aquatic Sciences. Keywords: Projects, Sensors, Sources, Locations, IDN Nodes, Data Centers, Science Keywords, Services Keywords, URL Content Types, Chronostratigraphic Units
DIF Schema XSLT style sheet to create DIF schema in Semantic Web language Mapping terms to ontology Avoiding a monolithic ontology by mapping terms to other ontologies –e.g. Dublin Core
DIFs XSLT style sheet to convert DIFs to Semantic Language Mapping terms to ontology and DIF Schema Recording keywords of finest granularity Avoiding a monolithic ontology by mapping terms to other ontologies –e.g. Dublin Core, Cyc, DAML-time
Sesame: Middleware GUI or API Database: PostgreSQL or Oracle
Advantages for the GCMD Semantic Web presents database structure in a machine parsable format Ability to search for the semantic relationships among any DIF terms within the ontology Do not need to change the database structure when new classes and relationships are added Real advantages = when ontology is enriched
UWG Assistance Gene Major –How do we handle the scalability issue with regards to population of the DIFs. We have now over 15,000 entries; updates require much work to a) determine if data is still viable (b) make revisions. Database revisions such as phone numbers, etc. are easy; content revisions are more labor intensive. –How do we handle the same data sets being delivered from multiple systems (not data centers)..like OPeNDAP, NOMADS, THREDDS, etc. All may deliver the same data set, but how do we point to all those catalogs? How to we index the DIFs to do that. We could use Related_URL, but is that the right solution. –How can we get interaction between data sets and publishers. In other words, what mechanisms can we use to link data sets with the current literature. –How do you feel about potential privacy concerns over contact information within GCMD DIFs/SERFs
Stephanie Leicester –Suggestions about how to encourage DIF Authors to review and update their records regularly –Direction and guidance on developing a metadata standard for archived samples Heather Weir –Suggestions about how to increase the number of SERFs –Direction and guidance with the Learning Center and Astronomy keywords Scott Ritz –To spread the word in their community (to data users and producers). –Encourage data holders they meet to submit metadata. –Closer interaction with Science Coordinators: new data notifications, contacts.
Monica Holland –Continue to spread the word about GCMD. Cheryl Solomon –Suggest University sources for ecological datasets –Suggest international sources for metadata Tyler Stevens –What direction we should take with GIS in the GCMD. –More GIS contacts to work with to increase GIS within the GCMD