Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text information storage and retrieval and the CDS/ISIS program

Similar presentations


Presentation on theme: "Text information storage and retrieval and the CDS/ISIS program"— Presentation transcript:

1 Text information storage and retrieval and the CDS/ISIS program
*** Text information storage and retrieval and the CDS/ISIS program Paul NIEUWENHUYSEN University Library, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium

2 *** What is a database? A database is a collection of similar data records stored in a common file (or collection of files).

3 Software type = information retrieval software
*** Software type = information retrieval software Software for information storage and retrieval (ISR software) Text(-oriented) database management systems (Text-DBMS) Text information management systems (TIMS) Document retrieval systems Document management systems

4 Information retrieval: via a database to the user
*** Information retrieval: via a database to the user Information content Linear file Inverted file Database Search engine Search interface User

5 Information retrieval: the basic processes in search systems
*** Information retrieval: the basic processes in search systems Information problem Text documents Representation Representation Evaluation and feedback Query Indexed documents Comparison Retrieved documents

6 Information retrieval systems: many components make up a system
*** Information retrieval systems: many components make up a system Any retrieval system is built up of many more or less independent components. These components can be modified to increase the quality of the results more or less independently.

7 Information retrieval systems: important components
*** Information retrieval systems: important components the information content system to describe formal aspects of information items system to describe the subjects of information items concrete descriptions of information items = application of the used information description systems information storage and retrieval computer program(s) computer system used for retrieval type of medium or information carrier used for distribution

8 Information retrieval systems: the information content
*** Information retrieval systems: the information content The information content is the information that is created or gathered by the producer. The information content is independent of software and of distribution media. The information content is input into the retrieval system using a system (rules) to describe the formal aspects a system (rules) to describe the contents (classification, thesaurus,...)

9 Information retrieval systems: media used for distribution
*** Information retrieval systems: media used for distribution Hard copy (for information retrieval systems only in the broad sense) Print Microfiche For computers: (for information retrieval systems strictu sensu) Magnetic tape Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...) Online

10 Information retrieval systems: the computer program
*** Information retrieval systems: the computer program The information retrieval program consists of several modules, including: The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s). The search engine provides the search features and power that allow the inverted file(s) to be searched. The interface between the system and the user determines how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands).

11 What determines the results of a search in a retrieval system?
*** What determines the results of a search in a retrieval system? the information retrieval system ( = contents + system) the user of the retrieval system and the search strategy applied to the system Result of a search

12 Characteristics / definition of structured text-information
*** Characteristics / definition of structured text-information The text information is structured. (files, records, fields, sub-fields, links/relations among records,...) The length of records and fields can be “long”. Some fields are multi-valued, i.e. they occur more than once.

13 Layered structure of a database
*** Layered structure of a database Database File Records Fields Characters + in many systems: relations / links between records

14 Structure of a bibliographic file
*** Structure of a bibliographic file Record No. 1 Title Author 1: name + first name Author 2: ... Source Descriptor 1 Descriptor 2 Record No. 2 Sub- fields Repeated

15 Thesaurus: description
*** Thesaurus: description Thesaurus = system to control a vocabulary + the contents of this vocabulary Thesaurus program = program to create, manage, modify and/or search a thesaurus using a computer

16 Thesaurus relations *** Term(s) with broader meaning
BT (= Broader Term) RT (Related Term) UF (= Use For) Other term(s) Term Synonym(s) NT (= Narrower Term) Term(s) with narrower meaning

17 Thesaurus applications
*** Thesaurus applications To find/choose index terms to add these to items, when terms are taken from a controlled vocabulary To find more and/or better terms to search a database (to increase recall and precision) To find more and/or better terms during writing To understand the meaning of a term, by inspecting the scope note of the term and/or the relations with other terms

18 Thesaurus examples **-Examples
General systems / universal systems / on all subjects Library of Congress Subject Headings (LCSH) Focused on a particular subject domain ERIC INSPEC Medical Subject Headings (MeSH) Psychological Abstracts / PsycInfo Sociological Abstracts / SocioFile

19 Database systems: why study this subject briefly ?
*** Database systems: why study this subject briefly ? To achieve a better understanding of the inner workings of the external information retrieval systems that you use, so that you can exploit these more efficiently To be able to evaluate the quality of database systems you are confronted with, so that you can make better choices among available systems, offer constructive suggestions to the manager, ...

20 Database systems: why study this subject in detail?
**- Database systems: why study this subject in detail? To acquire the knowledge and skills to create / set up / manage your own local database system on a computer

21 Database systems: definition
*** Database systems: definition A database (management) system is a program or set of programs, providing a means by which a user can easily store and retrieve data in the form of “databases”.

22 Information retrieval software: related terms
**- Information retrieval software: related terms Software for information storage and retrieval (ISR software) Text(-oriented) database management systems (Text-DBMS) Text information management systems (TIMS) Document retrieval systems Document management systems

23 Information retrieval software: applications (Part 1)
**- Information retrieval software: applications (Part 1) Documentation centres Archives Libraries Musea Medical files Marketing departments Schools Bibliographic databases Documents Archived documents Books / Documents Objects / Books / ... Patient’s histories Clients / Potential clients Courses / Teachers Publications / ...

24 Information retrieval software: applications (Part 2)
**- Information retrieval software: applications (Part 2) Meeting calendars Product information Laboratories Personal documentation Patent office Co-operating information networks ... Meetings = conferences Product descriptions Recipes Documents Patents Documents / Persons / Institutes / Events / ...

25 Cataloguing: hard copy versus computer-based
**- Cataloguing: hard copy versus computer-based Hard copy “Input” , i.e. cataloguing, on cards determines directly the “ouput”, i.e. the format of the data on the card as presented to the user Summarized: INPUT=OUTPUT Computer-based Input in the database in fields allows later output in various formats for presentation Summarized: 1. INPUT, 2. various OUTPUTs

26 Text-information management systems: characteristics and definition
*** Text-information management systems: characteristics and definition The information in the database is text oriented. Therefore, several features are required: ability to store relatively long blocks of texts ability to retrieve items in which specific words or terms occur anywhere

27 Text-information management: from free-form to structure
**- Text-information management: from free-form to structure Free form text information without structure Text database with information structured in files, records, fields, sub-fields, with links/relations among records,... (Ideally, each fields is repeatable = can be multi-valued, = can occur more than once in each record.)

28 Text-information management: types of software
*** Text-information management: types of software Software type Word processing software Free-form or structured text information database software Features Must be learnt anyway. Slow sequential searching. Additional software to be purchased and learnt. Fast searching via index(es).

29 Advantages of structured text-retrieval versus X-base systems
**- Advantages of structured text-retrieval versus X-base systems Feature Many long fields, forming long records Repeatable fields Subfields Variable field lengths Fast searching any word in all fields Thesaurus to help searching Text- retrieval Yes X-base systems No

30 Hierarchy in the use of a database
*** Hierarchy in the use of a database Database structure Input / Editing Searching / Output

31 Functions of database management software
*** Functions of database management software Input / edit using keyboard or batch input Indexing of the database(s) Browse / Search / Select / Retrieve data from database Output (Sort / Display / Print to file / Print to paper) + Export / Import

32 The various formats of records in a database
**- The various formats of records in a database Format to facilitate retrieval = inverted file Input format = Edit format Internal format for long term storage Display format for output to display, printer, file Format for exchange purposes

33 Structure of records / Field tags
**- Structure of records / Field tags Field tags: Examples from the Common Communication Format (supported by UNESCO): Title Author(s) Notes Abstract

34 Which advantages offers a document management system on computer?
*** !? Question !? Task !? Problem !? Which advantages offers a document management system on computer?

35 Advantages of a document system on computer, for the user(s)
*** Advantages of a document system on computer, for the user(s) Access to information is easier. Access to information is faster. Online access is possible even when centre is closed. Online access is possible from a distance. Integration in search module with data on loan status. More elements of the records can serve as search term. Combinations of search terms can be used. Results /selections can be stored as computer files.

36 Advantages of a document system on computer, for the manager(s)
**- Advantages of a document system on computer, for the manager(s) Multiplication / distribution / exchange is easier. Available computer data can be input / incorporated. Global changes are easier. The system takes less physical space. Output to printer allows production of cards, listings,... Sorting of records in output is easier. The system is resistant to physical aging. Parts are better integrated (for instance books and loans) The system can offer statistical information.

37 Drawbacks of a document system on computer
**- Drawbacks of a document system on computer L The costs of software and hardware can be high. Training related to computers is required. Evolution in computer applications should be considered. Some systems do not provide a backup.

38 Unesco’s involvement with information management
**- Unesco’s involvement with information management Unesco - General Information Programme Computer programs for information management Standards (for instance CCF) Various subject-oriented information projects Libraries and archives Unesco’s other programmes and divisions

39 Tools for information management by Unesco
**- Tools for information management by Unesco IDAMS DOS program for numeric data analysis CDS/ISIS Program for storage and retrieval of structured text-oriented information Interface between IDAMS and CDS/ISIS To use both programs efficiently in one project Common Communication Format (CCF) Guidelines on how to format a database

40 The IDAMS analysis program
*-- The IDAMS analysis program Software to create, manage and analyse local, in-house, number-oriented databases For DOS (or systems emulating DOS) Developed by an international team Distributed free of charge by the Unesco - General Information Programme (PGI) Detailed manual is available

41 The CDS/ISIS text database management program
**- The CDS/ISIS text database management program Software to create and manage local, in-house databases with primarily structured text as contents (NOT numbers, graphics, sound,...) Versions available for Mainframes (IBM) Minicomputers (Digital VAX) Microcomputers (DOS )

42 Micro-CDS/ISIS: options on the original main menu
*-- Micro-CDS/ISIS: options on the original main menu ________________________________________________________________________________ ________________________ Micro CDS/ISIS - Version 3.0 ________________________ C - Change data base L - Change dialogue language E - ISISENT - Data entry services S - ISISRET - Information retrieval services P - ISISPRT - Sorting and printing services I - ISISINV - Inverted file services D - ISISDEF - Data base definition services M - ISISXCH - Master file services U - ISISUTL - System utility services A - ISISPAS - Advanced programming services X - Exit (to MSDOS)

43 Micro-CDS/ISIS: original main menu on the display
*-- Micro-CDS/ISIS: original main menu on the display

44 Micro-CDS/ISIS running in Microsoft Windows
*-- Micro-CDS/ISIS running in Microsoft Windows

45 Micro-CDS/ISIS running in Microsoft Windows (full screen)

46 CDS/ISIS: general features
*-- CDS/ISIS: general features Available for several operating systems Multi-user editing and searching in a network Unlimited number of databases can be stored No practical limitation in the number of records per database Multiple field-occurrences are possible Only few limitations in a database structure Can be applied on CD-ROM

47 CDS/ISIS: input, indexing, searching, output
*-- CDS/ISIS: input, indexing, searching, output More than one input worksheet can be applied Powerful word-, phrase- and field-indexing Powerful, fast searching Powerful in output formats

48 CDS/ISIS: positive non-technical characteristics
*-- CDS/ISIS: positive non-technical characteristics Good, detailed manual in English, French,... Used in more than 4000 institutes Used internationally, worldwide National user-groups are active in many countries User interface available in English, French, Spanish, Arabian, Chinese,... Suitable database structures are available Free forum about CDS/ISIS by electronic mail

49 CDS/ISIS is available free of charge
*-- CDS/ISIS is available free of charge From National distributors approved by Unesco From subject-oriented distributors approved by Unesco From the Unesco - General Information Programme in Paris (From the secretariat of the Unesco - International Hydrological Programme in Paris, for water-related projects)

50 CDS/ISIS database structures
*-- CDS/ISIS database structures example database structured according to CCF, on diskette, distributed by the Unesco - General Information Programme example database structured according to CCF, published in a guide by the Unesco - International Hydrological Programme databases and manuals by other organisations

51 Interface to link CDS/ISIS with IDAMS
*-- Interface to link CDS/ISIS with IDAMS To use both packages efficiently together, an interface program has also been developed by the Unesco - General Information Programme (PGI)

52 Important new features in CDS/ISIS version 3
*-- Important new features in CDS/ISIS version 3 More than one user can edit data at the same time in a computer network. User / program can call external, non-CDS/ISIS programs. Better Pascal programming language is included in CDS/ISIS .

53 CDS/ISIS and libraries: examples of applications
*-- CDS/ISIS and libraries: examples of applications A library automated with CDS/ISIS Central library providing bibliographic data for computers Another library automated with CDS/ISIS Smaller department libraries and documentation systems automated using CDS/ISIS Network

54 CDS/ISIS database structure
*-- CDS/ISIS database structure General CDS/ISIS Database Record Field Subfield Character Name < 5 characters MFN = Master File Number Tag = 1, 2, 3, ..., 999 ^a ^b ^c ... ABC...abc accented characters Unlimited 16 million per database 250 per record - 8000 per record 8000 per field

55 CDS/ISIS database files, defined by the user
*-- CDS/ISIS database files, defined by the user Name in DOS .ANY .FDT .FMT .FST .PFT .STW # used with 1 database 1 > 1 = or > 1 Full name ANY file Field Definition Table Worksheet(s) Field Select Table(s) Print Format(s) STOPword list Purpose Searching Structure Input Indexing Output and sorting

56 CDS/ISIS database files: the database contents
*-- CDS/ISIS database files: the database contents Name in DOS .MST .XRF .IFP .L01 .L02 .N01 .N02 Full name Master file Cross reference file B-tree index files

57 CDS/ISIS database files to change using a text editing program
*-- CDS/ISIS database files to change using a text editing program Where dbn = data base name Name in DOS dbn.ANY dbn.STW # used with a database 1 Full name ANY file STOPword list Purpose Indexing and sorting Searching

58 *-- CDS/ISIS database files, which can be changed using a text editing program Name in DOS .FDT .FST .PFT # used with a database 1 1,... 1 or several Full name Field Definition Table Field Select Table(s) Print (or display )Format(s) Purpose Structure Indexing Output

59 Advantages of using CDS/ISIS with Windows (Part 1)
*-- Advantages of using CDS/ISIS with Windows (Part 1) Multitasking in several windows: CDS/ISIS and other programs start CDS/ISIS from the program manager view CDS/ISIS and the file manager at the same time search or edit a CDS/ISIS database together with a thesaurus or classification scheme in another program switch easily between CDS/ISIS and a word processing program to produce output ...

60 Advantages of using CDS/ISIS with Windows (Part 2)
*-- Advantages of using CDS/ISIS with Windows (Part 2) Multitasking in several windows: multiple instances of CDS/ISIS view several databases at the same time in several instances of CDS/ISIS running at the same time ...

61 Advantages of using CDS/ISIS with Windows (Part 3)
*-- Advantages of using CDS/ISIS with Windows (Part 3) Copy and paste copy data from a document in a program for text processing, and paste into a CDS/ISIS database copy data from a CDS/ISIS database displayed on screen by CDS/ISIS, and paste into a document in another program copy data from one CDS/ISIS database displayed on screen, and paste into another database through the editing worksheet ...

62 Advantages of using CDS/ISIS with Windows (Part 4)
*-- Advantages of using CDS/ISIS with Windows (Part 4) Associations of file name extensions with programs associate the following CDS/ISIS file name extensions with a program for word processing: .any .fdt .par .pft .stw

63 CDS/ISIS: some wishes of users (Part 1)
*-- CDS/ISIS: some wishes of users (Part 1) General aspects: better use of Microsoft Windows possibility to open and work with more than one file/database on screen simultaneously, and to exchange data among those open files client-server architecture for the database management system better availability of CDS/ISIS applications developed by other users (including additional ISIS-Pascal programs)

64 CDS/ISIS: some wishes of users (Part 2)
*-- CDS/ISIS: some wishes of users (Part 2) Database structure: records longer than 8 KBytes support of non-text fields (graphics, audio,...) better availability of database structures developed by users Input: access to and copy from one or more authority files spell-check of database contents (language independent) direct import of non-ISO-structured ASCII files

65 CDS/ISIS: some wishes of users (Part 3)
*-- CDS/ISIS: some wishes of users (Part 3) Indexing: multiple inverted files Searching: save search statements (queries) for future runs Output: emphasis of search terms in the output on display and paper better use of the features of various printers direct output to PostScript printers

66 CDS/ISIS: some wishes of users (Part 4)
*-- CDS/ISIS: some wishes of users (Part 4) Interface with user more help messages in context online tutorial support for mouse

67 CDS/ISIS: further development going on
*-- CDS/ISIS: further development going on versions for various Unix computers version for Windows splitting into a client and a server package

68 CDS/ISIS database definition services: display menu
*-- CDS/ISIS database definition services: display menu

69 CDS/ISIS database modification services: display menu
*-- CDS/ISIS database modification services: display menu

70 CDS/ISIS database definition table: display of an example
*-- CDS/ISIS database definition table: display of an example

71 Copying and renaming a CDS/ISIS database structure
*-- Copying and renaming a CDS/ISIS database structure Copy XCOPY (not COPY) all files, using DOS or Copy using Windows If required: rename FDT file, FMT file(s), PFT file(s), ANY file, FST file(s),... AND (!): Change first lines in xxxxx.FDT to make these agree with new names of FST(s), FMT(s), PFT(s)

72 Limitations to the structure of a database
**- Limitations to the structure of a database What is the maximum number of databases managed by the program? of records in a database? of characters in a record? of fields in a record? of characters in a field? Can fields contain subfields? Can the user define / modify the structure? Is the amount of memory taken on disk by each record minimal or fixed? ...

73 Limitations to the contents of a database
**- Limitations to the contents of a database Only plain 7-bit or extended ASCII? §©¥¤£¢ Can documents be managed, such as texts from word processing programs with formatting codes? pictures / graphics / bitmaps ? movies / video /...? ...

74 Types of field contents to control input
**- Types of field contents to control input Type (Example) Alphanumeric (Title) Alphabetic (Country code) Numeric (Year) Pattern (Date)

75 CDS/ISIS manual data entry, editing / input services: display menu
*-- CDS/ISIS manual data entry, editing / input services: display menu

76 Manual inputting and editing using a keyboard
**- Manual inputting and editing using a keyboard Can more than 1 form / worksheet be used for a database? Are pre-fabricated database structures with input worksheets included and / or available? Can more than one user edit the same database at the same time?

77 Batch input / Import **- Is batch input possible?
Is a format conversion program included or available? ...

78 Activities related to indexing
**- Activities related to indexing Activity Intellectual, human indexing Develop an automatic indexing method Automatic indexing Who does it? Database producer / Thesaurus producer Database producer / Software features Computer with program Concrete action Attribute subject terms to records Making an index method file Making inverted file(s)

79 Indexes in books and databases: a comparison
**- Indexes in books and databases: a comparison Book Database Index_term_1 page x1, y1, z1,... Index_term_2 page x2, y2, z2,... ... Printed Invisible Index_term_1 record nr. x1 / field type nr. x1 / field occurrence x1 / position x1 record nr. y1 / field type nr. y1 / field occurrence x1 / position y1 ... Index_term_2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2

80 Index in a text retrieval system (such as CDS/ISIS)
**- Index in a text retrieval system (such as CDS/ISIS) Terminology: Index = Inverted file = Dictionary database dictionary on display database complete inverted file

81 Methods of inverted file creation
**- Methods of inverted file creation Æ Word indexing J Simple / automatic / no indication required L Loss of word context J A field structure is not required Æ Phrase indexing L Indication of phrases during input is required J Richer than separate words Æ Field indexing J Context is better preserved L A field structure is required

82 CDS/ISIS inverted file services: display menu
*-- CDS/ISIS inverted file services: display menu

83 Automatic indexing (file inversion)
**- Automatic indexing (file inversion) Word indexing? with proximity indexing? Field indexing? Sub-field indexing? Phrase indexing? Æ Maximum length of index entry? Æ List of stopwords available? Æ Immediately after input or in batch? (Slow down...?) Æ Indexing speed? Æ Adding prefixes/tags possible? Æ Modification of indexing possible? Possible? Obligatory?

84 !? Question !? Task !? Problem !? **-
Why can the index of a database be so large in comparison with the size of the database?

85 CDS/ISIS information retrieval services: display menu
*-- CDS/ISIS information retrieval services: display menu

86 CDS/ISIS information retrieval: example of a dictionary on the display
*-- CDS/ISIS information retrieval: example of a dictionary on the display

87 CDS/ISIS: features related to retrieval
*-- CDS/ISIS: features related to retrieval Browsing Dictionary > Searching /selecting Direct searching + mix of these methods Boolean operators: OR * AND (^ NOT) Previous search result: #n Field qualifier: search term / (n,m,...)

88 *-- CDS/ISIS: an additional, user-friendly search interface program, Heurisko Offers a more limited but more user-friendly interface with drop-down menus, to choose among available CDS/ISIS databases to search the chosen database and to display selected records on the video display to print search results / selections Is available since 1993, free of charge from CDS/ISIS distributors, with a manual.

89 Interactive searching of a database / Retrieval
**- Interactive searching of a database / Retrieval Browse in index(es)? Select from index(es)? Combine search terms? Proximity operators? (Adjacency / Same paragraph / ...) Truncated search term(s)? Limit search to specific field(s)? Highlighting of search terms in selected records? Ranking of output? Speed? Save search strategy?

90 Output from a database to various “devices”
**- Output from a database to various “devices” to video display to printer to computer file (“printing” to a file) =< ;

91 CDS/ISIS output (sorting and printing) services: display menu
*-- CDS/ISIS output (sorting and printing) services: display menu

92 CDS/ISIS printing worksheet: display of an example
*-- CDS/ISIS printing worksheet: display of an example

93 CDS/ISIS sorting worksheet: display of an example
*-- CDS/ISIS sorting worksheet: display of an example

94 Formatting of output from a database
*-- Formatting of output from a database Formatting aspects / levels Æ data in each record Æ lay-out of records on the printed page or in the output computer file Æ sorting of records in output In CDS/ISIS Æ .PFT file(s) Æ printing worksheet(s) Æ sorting worksheet(s)

95 Formatting of data within each record in output
**- Formatting of data within each record in output Independent of output device: Determine the sequence of the fields in each record. Omit specific fields from each record. Add field names or tags to the fields in each record. Indicate the search term(s) in each record. Dependent of output device: Specify character formats in each (sub)field: typeface + size + bold/italic/underline

96 Sorting / arranging of records in the whole output
**- Sorting / arranging of records in the whole output Can the user determine the sequence of the records? Which elements can be used as a basis for sorting? Can stopwords be omitted as a basis for sorting? What is the maximum number of sort levels? Can the user choose between ascending or descending order? Can duplicate records be eliminated? (If yes: Can the user determine the meaning of duplicate?) Can output formats (styles) be stored?

97 Advanced and experimental retrieval systems
*-- Advanced and experimental retrieval systems The system accords weights to terms in the database Frequency of occurrence in the database + ... The searcher accords weights to terms in his query Based on importance of the term Natural language interface between user and system The system derives word stems + word meanings + ... Relevance feedback and query reformulation User assesses relevance and the system refines query Dynamic user profile is a part of the system System understands the user and his query better

98 Additional programs for CDS/ISIS
*-- Additional programs for CDS/ISIS CDS/ISIS CDS/ISIS Pascal programming language compiler Additional program(s) Source code of additional program(s) in CDS/ISIS Pascal

99 Global modification program for CDS/ISIS
*-- Global modification program for CDS/ISIS GMOD.PAS allows the modification of a string in a specific field of all records, throughout the whole CDS/ISIS database.

100 Thesaurus program module: purpose
**- Thesaurus program module: purpose Does the database management program offer a thesaurus module which allows the user to create, modify, store, and delete relations between terms used in the database? This is mainly used to establish relations among controlled subject indexing terms. If more than one controlled vocabulary is used, these should be managed separately.

101 Structure of a thesaurus database record (Fields for “good” terms)
**- Structure of a thesaurus database record (Fields for “good” terms) “Good” term Controlled vocabulary to which the term belongs (if more than 1 is used in the same database) Scope note (= definition of the controlled term) Date of creation or modification of the term Notes

102 Structure of a thesaurus database record (Fields for relations)
**- Structure of a thesaurus database record (Fields for relations) BT (= broader term) term(s) with broader meaning TT (= top term) term highest in the hierarchy NT (= narrower term) term(s) with narrower meaning RT (= related term) other term(s) related to this one UF (= use for) synonym(s)

103 Structure of a thesaurus database record (Fields for forbidden terms)
**- Structure of a thesaurus database record (Fields for forbidden terms) Forbidden term US (= use instead) “good” term in the controlled vocabulary

104 Structure of a thesaurus database record (Fields for candidate terms)
**- Structure of a thesaurus database record (Fields for candidate terms) Candidate “good” term in the controlled vocabulary (Other fields as in the case of “good” terms)

105 Structure of a multilingual thesaurus database record
**- Structure of a multilingual thesaurus database record Each type of field in a thesaurus record occurs for each language.

106 Thesaurus program: desirable properties (Part 1)
**- Thesaurus program: desirable properties (Part 1) Multilingual user interface = menus and messages in more than 1 language Multilingual contents = terms in more than 1 language When a term in the thesaurus database is added, changed or deleted, the program automatically makes the corresponding changes throughout the whole thesaurus database, there where that term occurs The program controls the creation of impossible (= forbidden) or undesirable relations

107 Thesaurus program: desirable properties (Part 2)
**- Thesaurus program: desirable properties (Part 2) Can the thesaurus contents be formatted and printed or sent to file? Can more than 1 thesaurus be managed, linked to the same database? Can a thesaurus database can be used with more than 1 primary database? Can the program signal the presence of orphan terms (= terms without relation)?

108 **- Thesaurus program: integration with input/editing of the primary database How simply and quickly can the user search the thesaurus during manual input/editing? (for instance to use it as an authority list) copy a term from a thesaurus and paste into a database record? copy a term from the database and paste into a thesaurus? ...

109 Thesaurus program: integration with searching of the primary database
**- Thesaurus program: integration with searching of the primary database Can the user browse the thesaurus during a search in the database? Can the program automatically formulate a query, when the user selects terms in the thesaurus module? Does the program allow to include easily and quickly synonyms, narrower terms and broader terms in a query? ...

110 Automatic creation, deletion or adaptation of the reciprocal relation
**- Automatic creation, deletion or adaptation of the reciprocal relation Does a change by the user of a relation in one record cause an automatic change by the thesaurus program of the reciprocal relation in the corresponding record of the thesaurus database? Examples: change of BT changes NT in the corresponding record change of NT changes BT in the corresponding record change of RT changes RT in the corresponding record change of UF changes US in the corresponding record change of US changes UF in the corresponding record

111 **- Automatic control of the creation of impossible or undesirable relations Does the thesaurus program avoid the creation of impossible or undesirable relations, or does it warn the user? Examples of this kind of relations: circular hierarchy (a NT b, b NT c, c NT a, or longer) circular synonym relation (a UF b, b UF a) iterative synonym relations (a US b, b US c, or longer) incomplete relations (a RT b, while b does not exist) term related to itself (for instance: a NT a) ...

112 Trilingual thesaurus program module for CDS/ISIS: properties
*-- Trilingual thesaurus program module for CDS/ISIS: properties It is an additional program in CDS/ISIS Pascal language Usage is free of charge, as in the case of CDS/ISIS Thesaurus database management is based on CDS/ISIS The thesaurus program, as well as CDS/ISIS, offers a user interface in English, French, and Spanish The contents of a thesaurus database is trilingual : each term in English, French, and Spanish (each one replaceable by another language)

113 Trilingual thesaurus program for CDS/ISIS: the relations among terms
*-- Trilingual thesaurus program for CDS/ISIS: the relations among terms The available relations are: US, UF, NT, BT, TT, RT Unlimited number of occurrences for each type of relations in each record After a change of a relation, the program automatically adapts the corresponding relation in the corresponding thesaurus term records

114 Trilingual thesaurus program for CDS/ISIS: control of relations
*-- Trilingual thesaurus program for CDS/ISIS: control of relations The program avoids the creation of some impossible or undesirable relations: circular synonym relation (a UF b, b UF a) iterative synonym relations (a US b, b US c, or longer) incomplete relations (a RT b, while b does not exist)

115 Trilingual thesaurus for CDS/ISIS: integration with searching
*-- Trilingual thesaurus for CDS/ISIS: integration with searching The user can browse the thesaurus during a search in the primary database. The program automatically formulates a query in the primary database, when the user selects terms in the thesaurus module. The program allows to include easily and quickly synonyms, narrower terms and broader terms in a query. The thesaurus database can be used for searching with more than 1 primary database.

116 Trilingual thesaurus program module for CDS/ISIS: further properties
*-- Trilingual thesaurus program module for CDS/ISIS: further properties In each record describing a term, a field for a scope note is present. A field for date of term creation is present. Several printout formats are included.

117 How to obtain the trilingual thesaurus program for CDS/ISIS?
*-- How to obtain the trilingual thesaurus program for CDS/ISIS? the national distributor in your country UNESCO Headquarters, General Information Programme, 1 rue Miollis, Paris, France ...

118 Trilingual thesaurus program module for CDS/ISIS: conclusions
*-- Trilingual thesaurus program module for CDS/ISIS: conclusions - Negative: Not well integrated with the input/editing module of CDS/ISIS + Positive: Exceptionally interesting price/quality ratio

119 Security / privacy / protection of databases
**- Security / privacy / protection of databases Password for searching specific database(s) and / or fields and / or record Password for editing specific database(s) and / or fields and / or records Password for changing database structure input and modification work sheets sort and print formats of data in records sort and print formats of records in a selection

120 Security / privacy / protection provided by DOS
*-- Security / privacy / protection provided by DOS DOS can make files read-only hidden

121 Security / privacy / protection in CDS/ISIS
*-- Security / privacy / protection in CDS/ISIS SYSPAR.PAR file (entry 0) asks for a password, which can limit access to a particular database set of worksheets set of menus set of additional CDS/ISIS programs Using the read-only version, named ISISCD.EXE, prevents modifications. Menus can be changed or removed to prevent access.

122 Passwords and usage tracking
**- Passwords and usage tracking Does the use of passwords linked to users or user groups allow usage tracking by a systems manager? “Usage” = for instance, number and types of search and/or edit actions. This can be useful for studies and system management.

123 Data export in the case of CDS/ISIS
*-- Data export in the case of CDS/ISIS CDS/ISIS Database Database structure Contents Copy of all database files “Export” of data “Print” data to file Other CDS/ISIS user without database Other CDS/ISIS user with same database structure Other database management system

124 Manual versus batch import of data in a database
**- Manual versus batch import of data in a database Information items Manual input Batch input

125 Conversion and batch input in the case of a CDS/ISIS database
*-- Conversion and batch input in the case of a CDS/ISIS database File with database records in ASCII with field tags Fangorn program + Conversion specification file File with records in format of the CDS/ISIS database Import module in CDS/ISIS Records in the CDS/ISIS database

126 Format conversion program Fangorn
*-- Format conversion program Fangorn Authors: Besemer and Nieuwenhuysen Available via anonymous ftp from PCWS1.SCI.SNS.IT ftp.vub.ac.be in the directory \pub\projects\Docinfo\paul\cursus\isis\

127 *-- Specification of a format conversion in the case of Fangorn for CDS/ISIS

128 !? Question !? Task !? Problem !? **-
Which software packages for storage and retrieval of structured text do YOU know?

129 **-Examples Microcomputers software packages for for structured text retrieval: examples askSam Bib-Search CAIRS Cardbox-Plus CDS / ISIS Headfast IdeaList Inmagic Notes (Lotus / IBM) Personal Librarian Pro-Cite Reference Manager Strix STATUS Topic (Verity) ...

130 !? Question !? Task !? Problem !? **-
How can you use a word processing program together with a text retrieval system?

131 Word processing program to assist a retrieval program
**- Word processing program to assist a retrieval program F To polish text data before import in the database managed by the retrieval program $ To inspect output to printer before real printing 2 To accept output from the retrieval program for further and better formatting, followed by printing

132 Which benefits offers a field structure to databases?
**- !? Question !? Task !? Problem !? Which benefits offers a field structure to databases?

133 Field structure in records: benefits concerning input
**- Field structure in records: benefits concerning input The indication of fields in input worksheets guides the input. Default values can be assigned to fields which can avoid errors and can make input faster. The existence of fields allows control of the contents format of each specific field during input. ...

134 Field structure in records: benefits concerning searching
**- Field structure in records: benefits concerning searching User can limit search to specific fields. Field type adds information to contents. Field-indexing keeps data together in index. ...

135 Field structure in records: benefits concerning output
**- Field structure in records: benefits concerning output Field structure makes output easier to understand. In output, each field can be indicated with tag/prefix. Records can be sorted based on contents of a field. In output, the fields can be sorted in each record. In output, some fields can be omitted. ...

136 !? Question !? Task !? Problem !? **-
Besides all the benefits offered by a field structure in a database, which problems does this cause?

137 Field structure in records: problems (Part 1)
**- Field structure in records: problems (Part 1) In the short term, it is more expensive and time consuming, than handling less structured data. Initially, the database manager who wants to create a new database has to make decisions: which fields to create to subdivide the database records, which field tags or names to use for the internal housekeeping of the database by the chosen database management software package.

138 Field structure in records: problems (Part 2)
**- Field structure in records: problems (Part 2) The exchange of data, i.e. importing data in a database, which have been exported from another database, is hindered when the databases structures are not identical or compatible. ...

139 Exchange formats and standards for text database systems
*** Exchange formats and standards for text database systems Usage and aims: to allow efficient exchange of information among databases without loss of structural information to guide database managers in the creation of a database structure (records divided in fields and subfields) Examples: (MARC = machine readable catalogue) LC-MARC (=Library of Congress MARC); UNIMARC Common Communication Format (of UNESCO) SGML

140 Common Communication Format (CCF): description
**- Common Communication Format (CCF): description Developed by the Unesco - General Information Programme for international application Includes a system of numeric tags indicating the location of fields and subfields in the records the meaning of the fields and subfields

141 Common Communication Format (CCF): availability
**- Common Communication Format (CCF): availability Published and made available free of charge by the Unesco - General Information Programme Printed manuals Printed implementation notes Example CDS/ISIS database structured according to the Common Communication Format

142 Exchange of data among systems: requirements
**- Exchange of data among systems: requirements Subject thesaurus (relation-structure + contents) Subject classification scheme + level of usage Contents of fields (and subfields) in the records (in the case of bibliographic databases: cataloguing input rules) Database structure: records, fields, subfields,... as seen by the database manager Version of the program for database management Type of program for database management Alphabet used for the data

143 Compatibility among databases: an example
Library of Congress Subject Headings (LCSH) (a thesaurus) Universal Decimal Classification (UDC) Anglo American Cataloguing Rules (AACR) Common Communication Format (CCF) Version 3.0 CDS/ISIS program Extension of ASCII by IBM ISO standard for record storage !

144


Download ppt "Text information storage and retrieval and the CDS/ISIS program"

Similar presentations


Ads by Google