Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlled Vocabularies: What, Why, How?

Similar presentations


Presentation on theme: "Controlled Vocabularies: What, Why, How?"— Presentation transcript:

1 Controlled Vocabularies: What, Why, How?
Vocabulary Workshop, RAL, February 25, 2009 Controlled Vocabularies: What, Why, How?

2 Metadata Love it or hate it without metadata automated data handling isn’t possible For automated data handling to be possible across distributed data sources metadata standards are required Standardised metadata comprises fields that represent real world entities such as location, time, phenomena, etc.

3 Metadata These fields need to be populated
Plaintext may be used. Makes population easy, but it’s next to useless. Some real examples: A wide variety of chemical and biological parameters Amplitude de l'echo retrodiffuse Cu, Zn, Fe, Pb, Cd, Cr, Ni in biota MACR0-MEIOFAUNA,SED BIOCHEMISTRY,ZOOPLANKTON, CILIATES,BACT CELLS,BACT BIOMASS,LEUCINE UPT,PRIM. PROD,METABOL, COCCOLITH Plaintext should be confined to abstracts

4 Controlled Vocabularies
Much better to use concepts labelled using universally agreed terms that have universally agreed meanings A collection of concepts designed to populate a given metadata field may be called a controlled vocabulary Controlled vocabularies Ensure consistent spellings Ensure consistent syntax Well-managed controlled vocabularies Prevent metadata misunderstandings Maintain a static relationship between metadata fields and the real world

5 Thesuari Concepts within a controlled vocabulary may be semantically connected using simple relationships: Blue broader colour Colour narrower blue Colour related pigmentation Concepts from different controlled vocabularies describing the same type of thing may be semantically connected using simple mapping relationships: Bacillariophycaea exactMatch diatoms IPTS68 temperature closeMatch ITS90 temperature Nutrients in rivers relatedMatch nitrate in water bodies Salinity broadMatch physical oceanography Physical oceanography narrowMatch salinity The results may termed thesauri

6 Ontologies But what if the controlled vocabularies describe different types of thing? We can relate them by increasing the semantic richness of the relationships For example: We could have a controlled vocabulary of instruments We could also have a controlled vocabulary of parameters

7 Ontologies We can link these up using relationships such as:
Themosalinograph measures salinity Fluorometer measures chlorophyll Air temperature measuredBy psychrometer The result may be termed an ontology

8 Ontologies Ontology relationships are:
Semantically rich Potentially abundant Software agents need to have some relationship understanding to exploit the knowledge encoded in the ontology This is achieved through relationships describing relationships called rules

9 Knowledge Representation
Relationships between concepts may be expressed using Resource Description Framework (RDF) W3C standard XML encoding having ‘triples’ as its basic building block Each triple has a subject, a predicate and an object. For example: Colour related pigmentation Thermosalinograph measures salinity Familiar?

10 Knowledge Representation
Controlled vocabularies (concept collections) and thesauri may be represented using the Simple Knowledge Organization System (SKOS) W3C standard XML schema based on RDF Jointly developed by STFC and Manchester University Computer Science 2008 version is the one to use

11 Knowledge Representation
<?xml version="1.0" ?> - <rdf:RDF xmlns:rdf=" xmlns:skos=" xmlns:dc=" - <skos:Concept rdf:about="   <skos:externalID>SDN:P011:116:TEMPS901</skos:externalID>   <skos:prefLabel>Temperature (ITS-90) of the water column by CTD or STD</skos:prefLabel>   <skos:altLabel>CTDTmp90</skos:altLabel>   <skos:definition>Unavailable</skos:definition>   <dc:date> T10:45: </dc:date>   <skos:broadMatch rdf:resource=" />   </skos:Concept>   </rdf:RDF>

12 Knowledge Representation
Ontologies may be represented using Web Ontology Language (OWL) W3C standard XML schema based on RDF Example OWL document Alternative simple text encodings are available such as Open Biomedical Ontologies (OBO) OBO used for NERC-related EnvO ontology

13 Knowledge Management Tools
RDF Tools abound – see for example Jena is one of the better known SKOS See the SKOS Tool Shed Note this includes a Protégé plugin

14 Knowledge Management Tools
Protégé with appropriate plugin is the most widely used There are commercial alternatives such as TopBraid Composer MMI ( has developed a vocabulary to OWL converter (voc2OWL) OBO Text so text tools work OWL and SKOS converters available

15 Knowledge Management Tools
Mapping MMI have developed a mapping tool (VINE) to build maps from two OWL files Visualisation Concept maps are useful Cmap tools is very good FreeMind (open source)


Download ppt "Controlled Vocabularies: What, Why, How?"

Similar presentations


Ads by Google