Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies.

Similar presentations


Presentation on theme: "Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies."— Presentation transcript:

1 www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies : Thesauri and information retrieval Michael Middleton QUT School of Information Systems, Brisbane, Australia m.middleton@qut.edu.au for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005

2 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 2 CRICOS No. 00213J Introduction Context ….. History Vocabulary principles Thesaurus software Thesaurus building …. application Thesaurus evaluation The future

3 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 3 CRICOS No. 00213J Organise to maintain Context: Information life cycle create distribute use maintain recall reuse store dispose

4 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 4 CRICOS No. 00213J Context: Information management Domains Operational Analytical Strategic

5 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 5 CRICOS No. 00213J Context: indexing Producing representations of records or documents that constitute a finding aid to the records in a database or to part of a document –Assigned indexing –Derived indexing

6 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 6 CRICOS No. 00213J Indexer qualities The ‘Art’ of assigned indexing: –Empathy –Meticulousness –Consistency –General knowledge –Patience

7 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 7 CRICOS No. 00213J Indexing guidelines Conceptual analysis and assigning Aboutness Elements of the document to consider Exhaustivity Specificity Index what is in the item Co-ordination

8 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 8 CRICOS No. 00213J Assigned index representations Alphabetical Subject Classified –Alphabetical –Notation Chain

9 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 9 CRICOS No. 00213J Indexing exercise How consistent is database indexing? Example: the same paper in multiple databases: Middleton, M Skills expectations of library graduates http://eprints.qut.edu.au/archive/00000094/ 1.Index it yourself 2.Compare your indexing with others 3.Compare the indexing in ERIC and INSPEC

10 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 10 CRICOS No. 00213J Context: metadata Agent –Document description –Responsibility –Administrative –Provenance –Connections –Conditions of use

11 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 11 CRICOS No. 00213J Context: metadata Content –Topic (application of vocabulary control) –Coverage –Role

12 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 12 CRICOS No. 00213J Controlled vocabulary Thesaurus –A controlled vocabulary of terms in natural language that are designed for post-coordination Classification scheme –A scheme for organisation by categories in a systematic manner; this may involve grouping by subject, function or other criteria, or determining document naming conventions –Often involves notation

13 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 13 CRICOS No. 00213J Purpose Indexing by translating diverse natural language to consistent terminology Establishing relationships among terms Information retrieval improving precision and recall

14 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 14 CRICOS No. 00213J History Bibliographic databases –Many applications, list of online associated thesauri and classification schemes at http://sky.fit.qut.edu.au/~middletm/cont_voc.html Standards –ISO2788; ISO 5964 –ANSI Z39.19

15 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 15 CRICOS No. 00213J Thesaurus principles Term relationships Continuing evolution Internally consistent hierarchies to support database searching

16 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 16 CRICOS No. 00213J The vocabulary of a controlled indexing language formally organised so that the a priori relationships between concepts are made explicit. A thesaurus is an example of metadata The Thesaurus

17 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 17 CRICOS No. 00213J Thesaurus extract (ISO sample) 35 mm CAMERAS BTMINIATURE CAMERAS CAMERAS BTOPTICAL EQUIPMENT NTMOVING PICTURE CAMERAS STEREO CAMERAS STILL CAMERAS UNDERWATER CAMERAS RTPHOTOGRAPHY CINE CAMERAS BTMOVING PICTURE CAMERAS NTUNDERWATER CINE CAMERAS RTCINEMA CINEMA RTCINE CAMERAS DIVING RTUNDERWATER CAMERAS INSTANT PICTURE CAMERAS SNCameras which produce a finished print directly BTSTILL CAMERAS Land cameras USE VIEW CAMERAS MICROSCOPES BTOPTICAL EQUIPMENT MINIATURE CAMERAS BTSTILL CAMERAS NT35 mm CAMERAS MOVING PICTURE CAMERAS BTCAMERAS NTCINE CAMERAS TELEVISION CAMERAS OPTICAL EQUIPMENT NTCAMERAS MICROSCOPES PHOTOGRAPHY RTCAMERAS

18 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 18 CRICOS No. 00213J

19 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 19 CRICOS No. 00213J Standardising the Vocabulary Types of entities & forms of terms Singular vs plural Homonyms Choice of terms Scope notes and history notes

20 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 20 CRICOS No. 00213J Compound terms Terms should be factored into simpler elements to improve user’s understanding. Semantic factoring Syntactic factoring

21 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 21 CRICOS No. 00213J Semantic Relationships Equivalence –Establishing relationships between preferred (postable) and non-preferred (non-postable) terms Hierarchical –Establishing relationships between subordinate and superordinate terms. These may be distinguished as: Generic Whole-part Instance Associative –Establishing relationships between terms that are mentally associated, but not equivalent or hierarchical

22 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 22 CRICOS No. 00213J … but, the Functions thesaurus Whereas agenda papers might have –broader term documents In a functions thesaurus agenda papers might have –broader term meetings

23 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 23 CRICOS No. 00213J Applying a functional thesaurus Top Term PERSONNEL Scope Notes The function of managing all employees …… Related Terms COMPENSATION ESTABLISHMENT INDUSTRIAL RELATIONS etc, etc Narrower Terms ALLOWANCES APPEALS (Decisions) APPOINTMENT ARRANGEMENTS AUTHORISATION COMMITTEES COMPLIANCE etc, etc Use For Terms Employees Public Servants Staff

24 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 24 CRICOS No. 00213J

25 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 25 CRICOS No. 00213J Thesaurus Display Alphabetical hierarchies –One level above and below entry term –Complete hierarchy for each term or separate TT display Permuted term lists Combination with classification notation Graphic Displays

26 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 26 CRICOS No. 00213J Applying a thesaurus Download Term Tree from http://www.termtree.com.au http://www.termtree.com.au Free trial download from

27 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 27 CRICOS No. 00213J Thesaurus software Assigned Integrated database Deriving terminology

28 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 28 CRICOS No. 00213J Thesaurus software - assigned Terms are assigned by vocabulary specialists in independent database a.k.a.™a.k.a. –Synercon Management Consulting MultiTes OpenCyc SuperTHES –from THESmain/THESshow for mono-/multilingual thesauri Term Tree 2000 WebChoir Wordmap

29 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 29 CRICOS No. 00213J Thesaurus software – integrated database Terms are assigned by specialists, thesaurus works like active data dictionary to control database BASIS InMagic Bibliotech PROBibliotech PRO BRS/Search STAR

30 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 30 CRICOS No. 00213J Thesaurus software for deriving terminology Terms are created automatically from text Entrieva –SemioTagger™, SemioMap™ and SemioSkyline™ for viewing Intology –taxonomy builder Verity –Thematic Mapping Autonomy –taxonomy generation & categorization

31 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 31 CRICOS No. 00213J Thesaurus Building - 1 Users –Define –Identify needs –Define Thesaurus range & depth Raw vocabulary building –Identify sources –Collect and record terms

32 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 32 CRICOS No. 00213J Thesaurus Building -2 Vocabulary organisation –Cluster terms –Establish relationships using symbols Maintenance

33 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 33 CRICOS No. 00213J Business application Not long term collaborative efforts of classification specialists –Instead, adapt to business changes Not just descriptions of present business processes –Instead, reflect strategic planning, competitors Not necessarily a single taxonomy –Instead, multiple overlapping taxonomies

34 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 34 CRICOS No. 00213J Content management Describe content as it’s being created rather than classify after creation User-needs orientation

35 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 35 CRICOS No. 00213J Integrating taxonomies Accurate reporting Exchange of data Assist resource discovery –Information retrieval

36 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 36 CRICOS No. 00213J Thesaurus evaluation Qualities Information retrieval evaluation

37 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 37 CRICOS No. 00213J Thesaurus Qualities Scope and features description Display forms Correctness of hierarchies Use of scope, history and qualification Adherence to standards Syndetic measures –Connectedness –Accessibility

38 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 38 CRICOS No. 00213J Thesauri & Retrieval evaluation Cranfield experiments & since Recall and precision Influence on indexing –Conceptual analysis –Translation failure –Omissions –Exhaustivity/Specificity –Syntax and ‘false drops’ Maintenance costs

39 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 39 CRICOS No. 00213J Post-controlled vocabularies Use of a ‘Hedge’ of terms to represent a broad concept, eg: –‘psychological aspects of..........’ –‘........in Australia’ –‘....review items on.....’

40 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 40 CRICOS No. 00213J Still to come …… Research areas Metathesauri –Super – interlinked vocabularies (e.g. NLM) Semantic Web –Enhancing word association with usage statistics like links (e.g. THESUS)

41 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 41 CRICOS No. 00213J Review Controlled vocabulary types Software support Business processes Website –http://sky.fit.qut.edu.au/~middletm/cont_voc.html –(about to move to database driven site – redirection will be applied)

42 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 42 CRICOS No. 00213J Questions?


Download ppt "Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies."

Similar presentations


Ads by Google