Presentation on theme: "Towards Terminology Services: Reflections from the FACET Project Doug Tudhope Hypermedia Research Unit University of Glamorgan OCLC seminar, April, 2006."— Presentation transcript:
Towards Terminology Services: Reflections from the FACET Project Doug Tudhope Hypermedia Research Unit University of Glamorgan OCLC seminar, April, 2006
Presentation FACET Project –Faceted Knowledge Organisation Systems (KOS) –Semantic query expansion –Web Demonstrator –Evaluation –Need for standard representations and API Current work –Terminology Services –Pilot KOS web service browser –Semantic expansion service? Role for KOS in the Semantic Web? –Need to articulate context/rationale for KOS –What kind of Semantic Web?
FACET - Faceted Access to Cultural hEritage Terminology FACET - a collaborative project investigating the potential of semantic expansion in retrieval Aims: Integration of thesaurus into search process / interface Semantic query expansion taking advantage of facet structure
FACET Collaborators Research Council Funding: EPSRC 3 years National Museum of Science and Industry (NMSI): National Railway Museum and Science Museum Collections Database J. Paul Getty Trust Art and Architecture Thesaurus (AAT) Museum Documentation Association (MDA) Railway Thesaurus Canadian Heritage Information Network (CHIN) Advisors
Semantic Expansion Expanding over relationships in thesauri and related KOS allows the system to play an active role Ranking of matching results by semantic closeness Query Expansion (automatic/interactive) Augmented Browsing tools Underpinning technologies: Measures of distance over the semantic index space Multi-concept Matching Function Immediate application controlled vocabulary indexing but also relevant free text query expansion
Faceted Knowledge Organisation Systems Faceted systems based on primary division into fundamental, high-level categories (facets) Compound descriptors (multi-concept headings) are synthesised by combination of terms from limited number of fundamental facets In constructing AAT, adjectival noun phrases very common: e.g. painted oak furniture Rather than enumerate the nearly infinite number of object and subject descriptions needed by thesaurus users, the AAT decided to pursue the building blocks of these descriptors in the form of a faceted vocabulary (Guide to Indexing and Cataloging with the Art & Architecture Thesaurus)
Multi-concept subject headings allow highly specific descriptions and offer promise of precise queries However practical focus has tended to be on cataloguing rather than searching Poses problems for recall in retrieval and for browsing. Full potential yet to be exploited in retrieval Compound Descriptors and Queries e.g. painted oak furniture
Matching Problem The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance (Toni Petersen, AAT Director) eg Query: mahogany, dark yellow, brocading, Edwardian, armchair Descriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver chair Potentially extra / missing / partially and non-matching terms
The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance (Toni Petersen, AAT Director) Query: mahogany, dark yellow, brocading, Edwardian, armchair focus term must match after expansion Descriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver chair Potentially extra / missing / partially and non-matching terms Matching Problem
Query expansion (on query as a whole)
FACET Queries with Results
Evaluation and user study with standalone version Exploratory to assess how people search for information and how thesauri can inform this process. Formative to support further development of the research prototype. Dorothee Blocks PhD Thesis A qualitative study of thesaurus integration for end-user searching.
About the Evaluation (from Blocks 2004) Qualitative evaluation methodology employed Participants were professionals in collaborating institutions 20 sessions totalling 22 hours were conducted Each participant was given 3 tasks to complete, e.g. Please search the collection for something similar to the item in the photograph. Please try to be specific.
Some issues from the evaluation Initial allocation of functionality to interface elements did not support the stages of the search process Breaking down tasks into components from different facets Reformulating queries Expansion control on individual terms Model of controlled vocabulary search process
The complete model diagram (Blocks 2004) Matching user terms to KOS Selecting suitable KOS terms Including terms in the query Setting up the query Executing the query Retrieval of results Evaluating the success of the query Inspecting individual results – using information can lead to query reformulation
Interactive/Automatic Thesaurus Query Expansion Statistical IR Uncontrolled vocabulary, auto-indexing IQE/AQE – terms added to query Exact match with probabilistic weighting Tampere experiments with thesaurus AQE and strongly-structured queries support faceted approach Greenberg recent experiments on QE by thesaurus relationships FACET Controlled vocabulary, intellectual indexing Hybrid I/A QE – user selects terms to expand then AQE Semantic degree-of-match with faceted queries
FACET Web Demonstrator Illustrates thesaurus based expansion and faceted search Intended as an exploration of FACET research outcomes via dynamically generated Web components rather than a complete final interface Based on custom API for thesaurus programmatic access Browser-based interface (ASP application), using a combination of server-side scripting and compiled components
FACET Web Demonstator
Semantic Query Expansion
Some lessons learned Results show potential of faceted KOS for –Query expansion with semantically ranked results –Realtime implementation multi-concept matching function –Semantic expansion as a browsing tool –Potential combine with statistical and linguistic techniques How to generalise? need for Common KOS representations and APIs
Towards Terminology Services KOS-based services as elements of applications with some form of search/indexing component Next phase of work looks at common KOS representation formats and API protocols - making content available via programmatic interfaces Eg SKOS Core (RDF/XML) Schema and SKOS API deliverables of SWAD-Europe Thesaurus Activity - Experiments with XPATH-based KOS interfaces (using XML and SKOS schemas) promising for relatively small KOS held within the web browser, e.g. interactive possibilities, such as rolloverXPATH-based KOS interfaces
SKOS API SKOS Core (RDF/XML) Schema and SKOS API deliverables of SWAD-Europe Thesaurus Activity - SKOS API designed to provide programmatic access to thesauri and related KOS in SKOS Core – builds on previous NKOS work on KOS protocols Example SKOS API calls –getConcept (uri) –getConceptsMatchingKeyword/Regex (string) –getAllConceptRelatives (concept) –getSupportedSemanticRelations –getAllConceptRelatives (concept, relation) –getAllConceptsByPath (concept, relation, distance)
Pilot KOS Browser Client Web Service Developed pilot to work with a remote server as an initial experiment with the SKOS API, a 'rich client' browser displaying details for thesaurus concepts via web service calls Uses GEMET - GEneral Multilingual Environmental Thesaurus DREFT demonstration web services server based on SKOS API developed at ILRT, Bristol University Only a subset of SKOS API calls were available at time of work due to local requirements So we investigated possibilities with just 2 API calls
Pilot SKOS API Web Service Browser getConcept getAllConceptRelatives show semantically connected concepts but not relationships Navigation history and local cache of retrieved concepts implemented API needs more work but is a possible basis for web services
Caching Thesaurus data relatively static - change unlikely during a session Caching of concepts helps prevent unnecessary repeated server calls. Implementation of concept caching made a significant difference to apparent speed of operation
Future issues More complex services as API protocol elements: more advanced natural language functionality cross-mapping provision data-dependent filters (such as number of postings) semantic expansion as a service –different configurations KOS interface displays by single call –novel interfaces, such as navigation via semantic expansion –Query expansion for various ranked result query services –Term suggestion to assist indexing/annotation –More details: KOS at your Service: Programmatic Access to Knowledge Organisation Systems
Taxonomy of Knowledge Organisation Systems Gail Hodge Term Lists Authority Files, Glossaries, Gazetteers, Dictionaries Classification and Categorization Subject Headings Classification Schemes and Taxonomies eg DDC, scientific taxonomies Relationship Schemes Thesauri Semantic Networks (eg WordNet) (Ontologies)
Bridge/migration between KOS and Ontologies? KOS as elements of higher level ontologies and schemas –can help leverage them. Eg map a thesaurus to a top Ontology SKOS RDF/XML Schemas as a possible bridging step Ontologies (taken as formal precise definition of relationships) can be combined with inference rules and logic systems in applications with well defined objects and operations But rationale behind KOS not well understood in Semantic Web How do intended contexts of use compare?
Types of Knowledge Organisation System (KOS) from Zeng & Salaba: FRBR Workshop, OCLC 2005 Term Lists: Synonym Rings Authority Files Glossaries/Dictionaries Gazetteers Natural languageControlled language Weakly-structured Strongly-structured Classification & Categorization: Subject Headings Classification schemes Classification schemes Taxonomies Categorization schemes Relationship Groups : Thesauri Ontologies Semantic networks Thesauri Pick lists
Ontology and Information Systems (Barry Smith) Philosophical ontology as I shall conceive it here is what is standardly called descriptive or realist ontology. It seeks not explanation but rather a description of reality in terms of a classification of entities that is exhaustive in the sense that it can serve as an answer to such questions as: What classes of entities are needed for a complete description and explanation of all the goings-on in the universe? Ontological Commitment Some philosophers have thought that the way to do ontology is exclusively through the investigation of scientific theories. With the work of Quine (1953) there arose in this connection a new conception of the proper method of ontology, according to which the ontologists task is to establish what kinds of entities scientists are committed to in their theorizing.
Two Types of Ontology Systems (Barry Smith) Perhaps we can resolve our puzzle as to the degree to which information systems ontologists are indeed concerned to provide theories which are true of reality – as Patrick Hayes would claim – by drawing on a distinction made by Andrew Frank (1997) between two types of information systems ontology. On the one hand there are ontologies – like Onteks PACIS and IFOMISs BFO – which were built to represent some pre- existing domain of reality. Such ontologies must reflect the properties of the objects within its domain in such a way that there obtain substantial and systematic correlations between reality and the ontology itself. On the other hand there are administrative information systems, where (as Frank sees it) there is no reality other than the one created through the system itself. The system is thus, by definition, correct.
AI Ontology Background (Barry Smith) Knowledge Representation Ontologies growing out of background in: –Database Tower of Babel Problem (e-commerce) –Modelling of scientific theories (Gene ontology etc) AI goal radically extending scope of automation Generally, and in part for reasons of computational efficiency rather than ontological adequacy, information systems ontologists have devoted the bulk of their efforts to constructing concept-hierarchies; they have paid much less attention to the question of how the concepts represented within such hierarchies are in fact instantiated in the real world of what happens and is the case.
What is an Ontology? (T. Gruber) - In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. Practically, an ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology. We build agents that commit to ontologies. We design ontologies so we can share knowledge with and among these agents. A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge- based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly. For AI systems, what "exists" is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse.
Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS Often referred to in Semantic Web literature Needs to be problematised Only indirect link via an interpreter
Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS (AI) Ontology tends to be … Instance of scientific concept Fact in a possible world
Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS information retrieval (subject) KOS tends to be Probable relevance - aboutness Inter/Intra indexer consistency ? (eg Bates 1986) typically a complex entity
KOS as metadata - Index (or classify) a resource Semiotic Triangle (after Ogden & Richards) Indexed resource traditionally a complex entity such as a document or image. Semantic Web a wider context for resource Resource probably about concept - to some extent -based on probable relevance judgments SubjectOf is via aboutness not a clear-cut instance relationship Indexer (searcher) vocabulary consistency (eg Bates 1986) –likely to differ in terminology judgments One reason for informal modelling approach of KOS Term (Symbol)Resource (Referent) Concept (Thought) SubjectOf relationship
KOS - Informal by design? KOS designed to assist perceived needs of information retrieval users rather than modelling a simplified reality of a domain –basis of (much) KOS construction is intended assistance in indexing/ searching/browsing and generalised retrieval as much as logical properties of attributes –implications: levels of specialisation granularity of relationships Many KOS by design informal structures –pragmatic compromises for different uses –semantic relationships often fuzzy Semantic organisation understood as conventional –could be otherwise, different viewpoints inevitable –users assisted to explore and appropriate
Distributed KOS meaningful? Meaning of a concept depends on its semantic context within a KOS (and indexing practice, relevance judgements) Eg of KOS fragment (Getty AAT in FACET Web Demonstrator) Not necessarily straightforward apply KOS concepts out of this context (eg magenta) link in to other distributed structures and contexts Some open world Semantic Web implications problematic?
How to apply KOS? What is the purpose of a given KOS? - we need to specify/articulate more clearly Domain dependent level of precision in concept use Important to take into account how applications will process concepts Current KOS relationships at a useful level of generality for many retrieval-based applications (with some specialisation?) Cost/benefit issues for KOS applications in granularity of relationships and degree of formalisation
KOS in what kind of Semantic Web? Role for knowledge-based interactive tools in semantic web applications (in addition to emphasis on AI machine reasoning) –Reminiscent of old debates on appropriate limits to automation –A balance between system and human agency –Expert Systems or … Systems for Experts ? Smart, interactive tools allowing scope for tacit knowledge, informal representations
Contact Information Doug Tudhope School of Computing University of Glamorgan Pontypridd CF37 1DL Wales, UK
References Bates M Subject access in online catalogs: a design model, Journal of the American Society for Information Science, 37(6), Binding C., Tudhope D KOS at your Service: Programmatic Access to Knowledge Organisation Systems. JoDI 4(4), Blocks D., Cunliffe D. Tudhope D. A reference model for user-system interaction in thesaurus-based searching (in press). Journal of the American Society for Information Science and Technology. Campbell K., Oliver D., Spackman K., Shortliffe E Representing Thoughts, Words, and Things in the UMLS. Journal of the American Medical Informatics Association, 5 (5), FACET Web demonstrator FACET Xpath browsers Greenberg J Automatic query expansion via lexical-semantic relationships, Journal of the American Society for Information Science and Technology, 52(5), pp Gruber T. What is an ontology? Hendler J. Ontologies on the Semantic Web, In (S. Staab Ed.) Tremds & Controversies, IEEE Intelligent Systems, Järvelin K., Kekäläinen J., Niemi T ExpansionsTool: concept-based query extension and construction, Information Retrieval, 4(3/4), pp Smith B Ontology. In: (L. Floridi (ed.), Blackwell Guide to the Philosophy of Computing and Information, Oxford: Blackwell, 2003, 155–166. (Longer draft at Tudhope D., Binding C., Blocks D., Cunliffe D Compound Descriptors in Context: A Matching Function for Classifications and Thesauri. JCDL 2002, full paper (pdf)full paper (pdf) Tudhope D., Binding C Towards Terminology Services: experiences with a pilot web service thesaurus browser. Proc. International Conference on Dublin Core and Metadata Applications, (DC 2005), (version forthcoming in ASIST Bulletin). Tudhope D., Binding C., Blocks D., Cunliffe D. Query expansion via conceptual distance in thesaurus indexed collections (in press). Journal of Documentation.