Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies.

Similar presentations


Presentation on theme: "Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies."— Presentation transcript:

1 Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No J Controlled vocabularies : Thesauri and information retrieval Michael Middleton QUT School of Information Systems, Brisbane, Australia for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005

2 Queensland University of Technology FIT School of Information Systems MM 2 CRICOS No J Introduction Context ….. History Vocabulary principles Thesaurus software Thesaurus building …. application Thesaurus evaluation The future

3 Queensland University of Technology FIT School of Information Systems MM 3 CRICOS No J Organise to maintain Context: Information life cycle create distribute use maintain recall reuse store dispose

4 Queensland University of Technology FIT School of Information Systems MM 4 CRICOS No J Context: Information management Domains Operational Analytical Strategic

5 Queensland University of Technology FIT School of Information Systems MM 5 CRICOS No J Context: indexing Producing representations of records or documents that constitute a finding aid to the records in a database or to part of a document –Assigned indexing –Derived indexing

6 Queensland University of Technology FIT School of Information Systems MM 6 CRICOS No J Indexer qualities The ‘Art’ of assigned indexing: –Empathy –Meticulousness –Consistency –General knowledge –Patience

7 Queensland University of Technology FIT School of Information Systems MM 7 CRICOS No J Indexing guidelines Conceptual analysis and assigning Aboutness Elements of the document to consider Exhaustivity Specificity Index what is in the item Co-ordination

8 Queensland University of Technology FIT School of Information Systems MM 8 CRICOS No J Assigned index representations Alphabetical Subject Classified –Alphabetical –Notation Chain

9 Queensland University of Technology FIT School of Information Systems MM 9 CRICOS No J Indexing exercise How consistent is database indexing? Example: the same paper in multiple databases: Middleton, M Skills expectations of library graduates 1.Index it yourself 2.Compare your indexing with others 3.Compare the indexing in ERIC and INSPEC

10 Queensland University of Technology FIT School of Information Systems MM 10 CRICOS No J Context: metadata Agent –Document description –Responsibility –Administrative –Provenance –Connections –Conditions of use

11 Queensland University of Technology FIT School of Information Systems MM 11 CRICOS No J Context: metadata Content –Topic (application of vocabulary control) –Coverage –Role

12 Queensland University of Technology FIT School of Information Systems MM 12 CRICOS No J Controlled vocabulary Thesaurus –A controlled vocabulary of terms in natural language that are designed for post-coordination Classification scheme –A scheme for organisation by categories in a systematic manner; this may involve grouping by subject, function or other criteria, or determining document naming conventions –Often involves notation

13 Queensland University of Technology FIT School of Information Systems MM 13 CRICOS No J Purpose Indexing by translating diverse natural language to consistent terminology Establishing relationships among terms Information retrieval improving precision and recall

14 Queensland University of Technology FIT School of Information Systems MM 14 CRICOS No J History Bibliographic databases –Many applications, list of online associated thesauri and classification schemes at Standards –ISO2788; ISO 5964 –ANSI Z39.19

15 Queensland University of Technology FIT School of Information Systems MM 15 CRICOS No J Thesaurus principles Term relationships Continuing evolution Internally consistent hierarchies to support database searching

16 Queensland University of Technology FIT School of Information Systems MM 16 CRICOS No J The vocabulary of a controlled indexing language formally organised so that the a priori relationships between concepts are made explicit. A thesaurus is an example of metadata The Thesaurus

17 Queensland University of Technology FIT School of Information Systems MM 17 CRICOS No J Thesaurus extract (ISO sample) 35 mm CAMERAS BTMINIATURE CAMERAS CAMERAS BTOPTICAL EQUIPMENT NTMOVING PICTURE CAMERAS STEREO CAMERAS STILL CAMERAS UNDERWATER CAMERAS RTPHOTOGRAPHY CINE CAMERAS BTMOVING PICTURE CAMERAS NTUNDERWATER CINE CAMERAS RTCINEMA CINEMA RTCINE CAMERAS DIVING RTUNDERWATER CAMERAS INSTANT PICTURE CAMERAS SNCameras which produce a finished print directly BTSTILL CAMERAS Land cameras USE VIEW CAMERAS MICROSCOPES BTOPTICAL EQUIPMENT MINIATURE CAMERAS BTSTILL CAMERAS NT35 mm CAMERAS MOVING PICTURE CAMERAS BTCAMERAS NTCINE CAMERAS TELEVISION CAMERAS OPTICAL EQUIPMENT NTCAMERAS MICROSCOPES PHOTOGRAPHY RTCAMERAS

18 Queensland University of Technology FIT School of Information Systems MM 18 CRICOS No J

19 Queensland University of Technology FIT School of Information Systems MM 19 CRICOS No J Standardising the Vocabulary Types of entities & forms of terms Singular vs plural Homonyms Choice of terms Scope notes and history notes

20 Queensland University of Technology FIT School of Information Systems MM 20 CRICOS No J Compound terms Terms should be factored into simpler elements to improve user’s understanding. Semantic factoring Syntactic factoring

21 Queensland University of Technology FIT School of Information Systems MM 21 CRICOS No J Semantic Relationships Equivalence –Establishing relationships between preferred (postable) and non-preferred (non-postable) terms Hierarchical –Establishing relationships between subordinate and superordinate terms. These may be distinguished as: Generic Whole-part Instance Associative –Establishing relationships between terms that are mentally associated, but not equivalent or hierarchical

22 Queensland University of Technology FIT School of Information Systems MM 22 CRICOS No J … but, the Functions thesaurus Whereas agenda papers might have –broader term documents In a functions thesaurus agenda papers might have –broader term meetings

23 Queensland University of Technology FIT School of Information Systems MM 23 CRICOS No J Applying a functional thesaurus Top Term PERSONNEL Scope Notes The function of managing all employees …… Related Terms COMPENSATION ESTABLISHMENT INDUSTRIAL RELATIONS etc, etc Narrower Terms ALLOWANCES APPEALS (Decisions) APPOINTMENT ARRANGEMENTS AUTHORISATION COMMITTEES COMPLIANCE etc, etc Use For Terms Employees Public Servants Staff

24 Queensland University of Technology FIT School of Information Systems MM 24 CRICOS No J

25 Queensland University of Technology FIT School of Information Systems MM 25 CRICOS No J Thesaurus Display Alphabetical hierarchies –One level above and below entry term –Complete hierarchy for each term or separate TT display Permuted term lists Combination with classification notation Graphic Displays

26 Queensland University of Technology FIT School of Information Systems MM 26 CRICOS No J Applying a thesaurus Download Term Tree from Free trial download from

27 Queensland University of Technology FIT School of Information Systems MM 27 CRICOS No J Thesaurus software Assigned Integrated database Deriving terminology

28 Queensland University of Technology FIT School of Information Systems MM 28 CRICOS No J Thesaurus software - assigned Terms are assigned by vocabulary specialists in independent database a.k.a.™a.k.a. –Synercon Management Consulting MultiTes OpenCyc SuperTHES –from THESmain/THESshow for mono-/multilingual thesauri Term Tree 2000 WebChoir Wordmap

29 Queensland University of Technology FIT School of Information Systems MM 29 CRICOS No J Thesaurus software – integrated database Terms are assigned by specialists, thesaurus works like active data dictionary to control database BASIS InMagic Bibliotech PROBibliotech PRO BRS/Search STAR

30 Queensland University of Technology FIT School of Information Systems MM 30 CRICOS No J Thesaurus software for deriving terminology Terms are created automatically from text Entrieva –SemioTagger™, SemioMap™ and SemioSkyline™ for viewing Intology –taxonomy builder Verity –Thematic Mapping Autonomy –taxonomy generation & categorization

31 Queensland University of Technology FIT School of Information Systems MM 31 CRICOS No J Thesaurus Building - 1 Users –Define –Identify needs –Define Thesaurus range & depth Raw vocabulary building –Identify sources –Collect and record terms

32 Queensland University of Technology FIT School of Information Systems MM 32 CRICOS No J Thesaurus Building -2 Vocabulary organisation –Cluster terms –Establish relationships using symbols Maintenance

33 Queensland University of Technology FIT School of Information Systems MM 33 CRICOS No J Business application Not long term collaborative efforts of classification specialists –Instead, adapt to business changes Not just descriptions of present business processes –Instead, reflect strategic planning, competitors Not necessarily a single taxonomy –Instead, multiple overlapping taxonomies

34 Queensland University of Technology FIT School of Information Systems MM 34 CRICOS No J Content management Describe content as it’s being created rather than classify after creation User-needs orientation

35 Queensland University of Technology FIT School of Information Systems MM 35 CRICOS No J Integrating taxonomies Accurate reporting Exchange of data Assist resource discovery –Information retrieval

36 Queensland University of Technology FIT School of Information Systems MM 36 CRICOS No J Thesaurus evaluation Qualities Information retrieval evaluation

37 Queensland University of Technology FIT School of Information Systems MM 37 CRICOS No J Thesaurus Qualities Scope and features description Display forms Correctness of hierarchies Use of scope, history and qualification Adherence to standards Syndetic measures –Connectedness –Accessibility

38 Queensland University of Technology FIT School of Information Systems MM 38 CRICOS No J Thesauri & Retrieval evaluation Cranfield experiments & since Recall and precision Influence on indexing –Conceptual analysis –Translation failure –Omissions –Exhaustivity/Specificity –Syntax and ‘false drops’ Maintenance costs

39 Queensland University of Technology FIT School of Information Systems MM 39 CRICOS No J Post-controlled vocabularies Use of a ‘Hedge’ of terms to represent a broad concept, eg: –‘psychological aspects of ’ –‘ in Australia’ –‘....review items on.....’

40 Queensland University of Technology FIT School of Information Systems MM 40 CRICOS No J Still to come …… Research areas Metathesauri –Super – interlinked vocabularies (e.g. NLM) Semantic Web –Enhancing word association with usage statistics like links (e.g. THESUS)

41 Queensland University of Technology FIT School of Information Systems MM 41 CRICOS No J Review Controlled vocabulary types Software support Business processes Website –http://sky.fit.qut.edu.au/~middletm/cont_voc.html –(about to move to database driven site – redirection will be applied)

42 Queensland University of Technology FIT School of Information Systems MM 42 CRICOS No J Questions?


Download ppt "Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies."

Similar presentations


Ads by Google