Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN University Library, Vrije Universiteit Brussel,

Similar presentations


Presentation on theme: "Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN University Library, Vrije Universiteit Brussel,"— Presentation transcript:

1

2 Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN University Library, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium

3 What is a database? A database is a collection of similar data records stored in a common file (or collection of files).A database is a collection of similar data records stored in a common file (or collection of files). ***

4 Software type = information retrieval software Software for information storage and retrievalSoftware for information storage and retrieval (ISR software) Text(-oriented) database management systemsText(-oriented) database management systems(Text-DBMS) Text information management systemsText information management systems(TIMS) Document retrieval systemsDocument retrieval systems Document management systemsDocument management systems ***

5 Information retrieval: via a database to the user *** Information content Linear fileInverted file Search engine Search interface User Database

6 Comparison Information retrieval: the basic processes in search systems Information problem Representation QueryIndexed documents Representation Retrieved documents Text documents Evaluation and feedback ***

7 Information retrieval systems: many components make up a system Any retrieval system is built up of many more or less independent components.Any retrieval system is built up of many more or less independent components. These components can be modified to increase the quality of the results more or less independently.These components can be modified to increase the quality of the results more or less independently. ***

8 Information retrieval systems: important components *** the information content system to describe formal aspects of information items system to describe the subjects of information items concrete descriptions of information items = application of the used information description systems information storage and retrieval computer program(s) computer system used for retrieval type of medium or information carrier used for distribution

9 Information retrieval systems: the information content The information content is the information that is created or gathered by the producer.The information content is the information that is created or gathered by the producer. The information content is independent of software and of distribution media.The information content is independent of software and of distribution media. The information content is input into the retrieval system usingThe information content is input into the retrieval system using »a system (rules) to describe the formal aspects »a system (rules) to describe the contents (classification, thesaurus,...) ***

10 Information retrieval systems: media used for distribution Hard copy (for information retrieval systems only in the broad sense)Hard copy (for information retrieval systems only in the broad sense) »Print »Microfiche For computers: (for information retrieval systems strictu sensu)For computers: (for information retrieval systems strictu sensu) »Magnetic tape »Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...) »Online ***

11 Information retrieval systems: the computer program The information retrieval program consists of several modules, including: The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s).The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s). The search engine provides the search features and power that allow the inverted file(s) to be searched.The search engine provides the search features and power that allow the inverted file(s) to be searched. The interface between the system and the user determines how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands).The interface between the system and the user determines how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands). ***

12 What determines the results of a search in a retrieval system? the information retrieval system ( = contents + system)the information retrieval system ( = contents + system) the user of the retrieval system and the search strategy applied to the systemthe user of the retrieval system and the search strategy applied to the system *** Result of a search

13 Characteristics / definition of structured text-information The text information is structured. (files, records, fields, sub-fields, links/relations among records,...)The text information is structured. (files, records, fields, sub-fields, links/relations among records,...) The length of records and fields can be “long”.The length of records and fields can be “long”. Some fields are multi-valued, i.e. they occur more than once.Some fields are multi-valued, i.e. they occur more than once. ***

14 Layered structure of a database Database File Records Fields Characters + in many systems: relations / links between records ***

15 Structure of a bibliographic file Record No. 1 Title Author 1: name + first name Author 2:... Source Descriptor 1 Descriptor 2... Record No. 2 Sub- fields Repeated fields ***

16 Thesaurus: description Thesaurus =Thesaurus = »system to control a vocabulary + »the contents of this vocabulary Thesaurus program =Thesaurus program = program to create, manage, modify and/or search a thesaurus using a computer ***

17 Thesaurus relations Term(s) with broader meaning BT (= Broader Term) RT (Related Term) UF (= Use For) Other term(s) Term Synonym(s) NT (= Narrower Term) Term(s) with narrower meaning ***

18 Thesaurus applications To find/choose index terms to add these to items, when terms are taken from a controlled vocabularyTo find/choose index terms to add these to items, when terms are taken from a controlled vocabulary To find more and/or better terms to search a database (to increase recall and precision)To find more and/or better terms to search a database (to increase recall and precision) To find more and/or better terms during writingTo find more and/or better terms during writing To understand the meaning of a term, by inspectingTo understand the meaning of a term, by inspecting »the scope note of the term and/or »the relations with other terms ***

19 Thesaurus examples General systems / universal systems / on all subjectsGeneral systems / universal systems / on all subjects »Library of Congress Subject Headings (LCSH) Focused on a particular subject domainFocused on a particular subject domain »ERIC »INSPEC »Medical Subject Headings (MeSH) »Psychological Abstracts / PsycInfo »Sociological Abstracts / SocioFile **-Examples

20 Database systems: why study this subject briefly ? To achieve a better understanding of the inner workings of the external information retrieval systems that you use, so that you can exploit these more efficientlyTo achieve a better understanding of the inner workings of the external information retrieval systems that you use, so that you can exploit these more efficiently To be able to evaluate the quality of database systems you are confronted with, so that you canTo be able to evaluate the quality of database systems you are confronted with, so that you can »make better choices among available systems, »offer constructive suggestions to the manager, »... ***

21 Database systems: why study this subject in detail? To acquire the knowledge and skills to create / set up / manage your own local database system on a computer **-

22 Database systems: definition A database (management) system is a program or set of programs, providing a means by which a user can easily store and retrieve data in the form of “databases”. ***

23 Information retrieval software: related terms Software for information storage and retrievalSoftware for information storage and retrieval (ISR software) Text(-oriented) database management systemsText(-oriented) database management systems(Text-DBMS) Text information management systemsText information management systems(TIMS) Document retrieval systemsDocument retrieval systems Document management systemsDocument management systems **-

24 Information retrieval software: applications (Part 1) Documents Archived documents Books / Documents Objects / Books /... Patient’s histories Clients / Potential clients Courses / Teachers Publications /... Documentation centresDocumentation centres ArchivesArchives LibrariesLibraries MuseaMusea Medical filesMedical files Marketing departmentsMarketing departments SchoolsSchools Bibliographic databasesBibliographic databases **-

25 Information retrieval software: applications (Part 2) Meeting calendarsMeeting calendars Product informationProduct information LaboratoriesLaboratories Personal documentationPersonal documentation Patent officePatent office Co-operating information networksCo-operating information networks Meetings = conferences Product descriptions Recipes Documents Patents Documents / Persons / Institutes / Events /... **-

26 Cataloguing: hard copy versus computer-based Hard copyHard copy »“Input”, i.e. cataloguing, on cards determines directly the “ouput”, i.e. the format of the data on the card as presented to the user »Summarized: INPUT=OUTPUT Computer-basedComputer-based »Input in the database in fields allows later output in various formats for presentation »Summarized: 1. INPUT, 2. various OUTPUTs **-

27 Text-information management systems: characteristics and definition The information in the database is text oriented. Therefore, several features are required: »ability to store relatively long blocks of texts »ability to retrieve items in which specific words or terms occur anywhere ***

28 Text-information management: from free-form to structure Free form text information without structure Text database with information structured in files, records, fields, sub-fields, with links/relations among records,... (Ideally, each fields is repeatable = can be multi-valued, = can occur more than once in each record.) **-

29 Text-information management: types of software Software type Word processing softwareWord processing software Free-form or structured text information database softwareFree-form or structured text information database software *** Features Must be learnt anyway. Slow sequential searching. Additional software to be purchased and learnt. Fast searching via index(es).

30 Advantages of structured text-retrieval versus X-base systems Feature Many long fields, forming long recordsMany long fields, forming long records Repeatable fieldsRepeatable fields SubfieldsSubfields Variable field lengthsVariable field lengths Fast searching any word in all fieldsFast searching any word in all fields Thesaurus to help searchingThesaurus to help searching Text- retrieval Yes X-base systems No **-

31 Hierarchy in the use of a database Database structure Input / Editing Searching / Output ***

32 Functions of database management software Input / edit using keyboard or batch inputInput / edit using keyboard or batch input Indexing of the database(s)Indexing of the database(s) Browse / Search / Select / Retrieve data from databaseBrowse / Search / Select / Retrieve data from database Output (Sort / Display / Print to file / Print to paper) +Output (Sort / Display / Print to file / Print to paper) + Export / ImportExport / Import ***

33 The various formats of records in a database Input format = Edit format Internal format for long term storage Display format for output to display, printer, file Format for exchange purposes Format to facilitate retrieval = inverted file **-

34 Structure of records / Field tags Field tags: Examples from the Common Communication Format (supported by UNESCO): 200 Title 200 Title 300 Author(s) 300 Author(s) 500 Notes 500 Notes 600 Abstract 600 Abstract **-

35 !? Question !? Task !? Problem !? Which advantages offers a document management system on computer? ***

36 Advantages of a document system on computer, for the user(s) Access to information is easier. Access to information is easier. Access to information is faster. Access to information is faster. Online access is possible even when centre is closed. Online access is possible even when centre is closed. Online access is possible from a distance. Online access is possible from a distance. Integration in search module with data on loan status. Integration in search module with data on loan status. More elements of the records can serve as search term. More elements of the records can serve as search term. Combinations of search terms can be used. Combinations of search terms can be used. Results /selections can be stored as computer files. Results /selections can be stored as computer files. ***

37 Advantages of a document system on computer, for the manager(s) Multiplication / distribution / exchange is easier. Multiplication / distribution / exchange is easier. Available computer data can be input / incorporated. Available computer data can be input / incorporated. Global changes are easier. Global changes are easier. The system takes less physical space. The system takes less physical space. Output to printer allows production of cards, listings,... Output to printer allows production of cards, listings,... Sorting of records in output is easier. Sorting of records in output is easier. The system is resistant to physical aging. The system is resistant to physical aging. Parts are better integrated (for instance books and loans) Parts are better integrated (for instance books and loans) The system can offer statistical information. The system can offer statistical information. **-

38  Drawbacks of a document system on computer  The costs of software and hardware can be high.  Training related to computers is required.  Evolution in computer applications should be considered.  Some systems do not provide a backup. **-

39 Unesco’s involvement with information management Unesco - General Information ProgrammeUnesco - General Information Programme »Computer programs for information management »Standards (for instance CCF) »Various subject-oriented information projects »Libraries and archives Unesco’s other programmes and divisionsUnesco’s other programmes and divisions **-

40 Tools for information management by Unesco IDAMSIDAMS DOS program for numeric data analysis CDS/ISISCDS/ISIS Program for storage and retrieval of structured text-oriented information Interface between IDAMS and CDS/ISISInterface between IDAMS and CDS/ISIS To use both programs efficiently in one project Common Communication Format (CCF)Common Communication Format (CCF) Guidelines on how to format a database **-

41 The IDAMS analysis program Software to create, manage and analyse local, in-house, number-oriented databasesSoftware to create, manage and analyse local, in-house, number-oriented databases For DOS (or systems emulating DOS)For DOS (or systems emulating DOS) Developed by an international teamDeveloped by an international team Distributed free of charge by the Unesco - General Information Programme (PGI)Distributed free of charge by the Unesco - General Information Programme (PGI) Detailed manual is availableDetailed manual is available *--

42 The CDS/ISIS text database management program Software to create and manage local, in-house databases with primarily structured text as contents (NOT numbers, graphics, sound,...)Software to create and manage local, in-house databases with primarily structured text as contents (NOT numbers, graphics, sound,...) Versions available forVersions available for »Mainframes(IBM) »Minicomputers (Digital VAX) »Microcomputers (DOS ) **-

43 Micro-CDS/ISIS: options on the original main menu ________________________________________________________________________________ ________________________ Micro CDS/ISIS - Version 3.0 ________________________ ________________________________________________________________________________ C - Change data base C - Change data base L - Change dialogue language L - Change dialogue language E - ISISENT - Data entry services E - ISISENT - Data entry services S - ISISRET - Information retrieval services S - ISISRET - Information retrieval services P - ISISPRT - Sorting and printing services P - ISISPRT - Sorting and printing services I - ISISINV - Inverted file services I - ISISINV - Inverted file services D - ISISDEF - Data base definition services D - ISISDEF - Data base definition services M - ISISXCH - Master file services M - ISISXCH - Master file services U - ISISUTL - System utility services U - ISISUTL - System utility services A - ISISPAS - Advanced programming services A - ISISPAS - Advanced programming services X - Exit (to MSDOS) X - Exit (to MSDOS) *--

44 Micro-CDS/ISIS: original main menu on the display *--

45 Micro-CDS/ISIS running in Microsoft Windows *--

46 Micro-CDS/ISIS running in Microsoft Windows (full screen)

47 CDS/ISIS: general features Available for several operating systemsAvailable for several operating systems Multi-user editing and searching in a networkMulti-user editing and searching in a network Unlimited number of databases can be storedUnlimited number of databases can be stored No practical limitation in the number of records per databaseNo practical limitation in the number of records per database Multiple field-occurrences are possibleMultiple field-occurrences are possible Only few limitations in a database structureOnly few limitations in a database structure Can be applied on CD-ROMCan be applied on CD-ROM *--

48 CDS/ISIS: input, indexing, searching, output More than one input worksheet can be appliedMore than one input worksheet can be applied Powerful word-, phrase- and field-indexingPowerful word-, phrase- and field-indexing Powerful, fast searchingPowerful, fast searching Powerful in output formatsPowerful in output formats *--

49 CDS/ISIS: positive non-technical characteristics Good, detailed manual in English, French,...Good, detailed manual in English, French,... Used in more than 4000 institutesUsed in more than 4000 institutes Used internationally, worldwideUsed internationally, worldwide National user-groups are active in many countriesNational user-groups are active in many countries User interface available in English, French, Spanish, Arabian, Chinese,...User interface available in English, French, Spanish, Arabian, Chinese,... Suitable database structures are availableSuitable database structures are available Free forum about CDS/ISIS by electronic mailFree forum about CDS/ISIS by electronic mail *--

50 CDS/ISIS is available free of charge From National distributors approved by UnescoFrom National distributors approved by Unesco From subject-oriented distributors approved by UnescoFrom subject-oriented distributors approved by Unesco From the Unesco - General Information Programme in ParisFrom the Unesco - General Information Programme in Paris (From the secretariat of the Unesco - International Hydrological Programme in Paris, for water-related projects)(From the secretariat of the Unesco - International Hydrological Programme in Paris, for water-related projects) *--

51 CDS/ISIS database structures example database structured according to CCF, on diskette, distributed by the Unesco - General Information Programmeexample database structured according to CCF, on diskette, distributed by the Unesco - General Information Programme example database structured according to CCF, published in a guide by the Unesco - International Hydrological Programmeexample database structured according to CCF, published in a guide by the Unesco - International Hydrological Programme databases and manuals by other organisationsdatabases and manuals by other organisations *--

52 Interface to link CDS/ISIS with IDAMS To use both packages efficiently together, an interface program has also been developed by the Unesco - General Information Programme (PGI)To use both packages efficiently together, an interface program has also been developed by the Unesco - General Information Programme (PGI) *--

53 Important new features in CDS/ISIS version 3 More than one user can edit data at the same time in a computer network.More than one user can edit data at the same time in a computer network. User / program can call external, non-CDS/ISIS programs.User / program can call external, non-CDS/ISIS programs. Better Pascal programming language is included in CDS/ISIS.Better Pascal programming language is included in CDS/ISIS. *--

54 CDS/ISIS and libraries: examples of applications A library automated with CDS/ISIS Central library providing bibliographic data for computers Smaller department libraries and documentation systems automated using CDS/ISIS Another library automated with CDS/ISIS Network *--

55 CDS/ISIS database structure Database Record Field Subfield Character CDS/ISIS Name < 5 characters MFN = Master File Number Tag = 1, 2, 3,..., 999 ^a ^b ^c... ABC...abc accented characters General Unlimited 16 million per database 250 per record per record 8000 per field *--

56 CDS/ISIS database files, defined by the user Name in DOS.ANY.FDT.FMT.FST.PFT.STW Full name ANY file Field Definition Table Worksheet(s) Field Select Table(s) Print Format(s) STOPword list # used with 1 database 1 > 1 1 = or > 1 1 Purpose Searching Structure Input Indexing Output Indexing and sorting *--

57 CDS/ISIS database files: the database contents Name in DOS.MST.XRF.IFP.L01.L02.N01.N02 Full name Master file Cross reference file B-tree index files *--

58 CDS/ISIS database files to change using a text editing program Name in DOS dbn.ANY dbn.STW Full name ANY file STOPword list # used with a database 1 Purpose Indexing and sorting Searching Where dbn = data base name *--

59 CDS/ISIS database files, which can be changed using a text editing program Name in DOS.FDT.FST.PFT Full name Field Definition Table Field Select Table(s) Print (or display )Format(s) # used with a database 1 1,... 1 or several Purpose Structure Indexing Output *--

60 Advantages of using CDS/ISIS with Windows (Part 1) Multitasking in several windows: CDS/ISIS and other programsMultitasking in several windows: CDS/ISIS and other programs »start CDS/ISIS from the program manager »view CDS/ISIS and the file manager at the same time »search or edit a CDS/ISIS database together with a thesaurus or classification scheme in another program »switch easily between CDS/ISIS and a word processing program to produce output »... *--

61 Advantages of using CDS/ISIS with Windows (Part 2) Multitasking in several windows: multiple instances of CDS/ISISMultitasking in several windows: multiple instances of CDS/ISIS »view several databases at the same time in several instances of CDS/ISIS running at the same time »... *--

62 Advantages of using CDS/ISIS with Windows (Part 3) Copy and pasteCopy and paste »copy data from a document in a program for text processing, and paste into a CDS/ISIS database »copy data from a CDS/ISIS database displayed on screen by CDS/ISIS, and paste into a document in another program »copy data from one CDS/ISIS database displayed on screen, and paste into another database through the editing worksheet »... *--

63 Advantages of using CDS/ISIS with Windows (Part 4) Associations of file name extensions with programsAssociations of file name extensions with programs »associate the following CDS/ISIS file name extensions with a program for word processing: —.any —.fdt —.par —.pft —.stw *--

64 CDS/ISIS: some wishes of users (Part 1) General aspects:General aspects: »better use of Microsoft Windows »possibility to open and work with more than one file/database on screen simultaneously, and to exchange data among those open files »client-server architecture for the database management system »better availability of CDS/ISIS applications developed by other users (including additional ISIS-Pascal programs) *--

65 CDS/ISIS: some wishes of users (Part 2) Database structure:Database structure: »records longer than 8 KBytes »support of non-text fields (graphics, audio,...) »better availability of database structures developed by users Input:Input: »access to and copy from one or more authority files »spell-check of database contents (language independent) »direct import of non-ISO-structured ASCII files *--

66 CDS/ISIS: some wishes of users (Part 3) Indexing:Indexing: »multiple inverted files Searching:Searching: »save search statements (queries) for future runs Output:Output: »emphasis of search terms in the output on display and paper »better use of the features of various printers »direct output to PostScript printers *--

67 CDS/ISIS: some wishes of users (Part 4) Interface with userInterface with user »more help messages in context »online tutorial »support for mouse *--

68 CDS/ISIS: further development going on versions for various Unix computersversions for various Unix computers version for Windowsversion for Windows splitting into a client and a server packagesplitting into a client and a server package *--

69 CDS/ISIS database definition services: display menu *--

70 CDS/ISIS database modification services: display menu *--

71 CDS/ISIS database definition table: display of an example *--

72 Copying and renaming a CDS/ISIS database structure Copy XCOPY (not COPY) all files, using DOS or Copy using Windows If required: rename FDT file, FMT file(s), PFT file(s), ANY file, FST file(s),... AND (!): Change first lines in xxxxx.FDT to make these agree with new names of FST(s), FMT(s), PFT(s) *--

73 Limitations to the structure of a database What is the maximum numberWhat is the maximum number »of databases managed by the program? »of records in a database? of characters in a record? »of fields in a record? of characters in a field? Can fields contain subfields?Can fields contain subfields? Can the user define / modify the structure?Can the user define / modify the structure? Is the amount of memory taken on disk by each record minimal or fixed?Is the amount of memory taken on disk by each record minimal or fixed? **-

74 Limitations to the contents of a database Only plain 7-bit or extended ASCII? §©¥¤£¢Only plain 7-bit or extended ASCII? §©¥¤£¢ Can documents be managed, such asCan documents be managed, such as »texts from word processing programs with formatting codes? »pictures / graphics / bitmaps ? »movies / video /...? »... **-

75 Types of field contents to control input Type (Example)Type (Example) Alphanumeric (Title)Alphanumeric (Title) Alphabetic (Country code)Alphabetic (Country code) Numeric (Year)Numeric (Year) Pattern (Date)Pattern (Date) **-

76 CDS/ISIS manual data entry, editing / input services: display menu *--

77 Manual inputting and editing using a keyboard Can more than 1 form / worksheet be used for a database?Can more than 1 form / worksheet be used for a database? Are pre-fabricated database structures with input worksheets included and / or available?Are pre-fabricated database structures with input worksheets included and / or available? Can more than one user edit the same database at the same time?Can more than one user edit the same database at the same time? **-

78 Batch input / Import Is batch input possible?Is batch input possible? Is a format conversion program included or available?Is a format conversion program included or available? **-

79 Activities related to indexing Activity Intellectual, human indexing Develop an automatic indexing method Automatic indexing Who does it? Database producer / Thesaurus producer Database producer / Software features Computer with program Concrete action Attribute subject terms to records Making an index method file Making inverted file(s) **-

80 Indexes in books and databases: a comparison Invisible Printed Index_term_1page x1, y1, z1,... Index_term_2page x2, y2, z2, Index_term_1 record nr. x1 / field type nr. x1 / field occurrence x1 / position x1 record nr. y1 / field type nr. y1 / field occurrence x1 / position y1... Index_term_2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2... BookDatabase **-

81 Index in a text retrieval system (such as CDS/ISIS) Terminology: Index = Inverted file = Dictionary database dictionary on display database complete inverted file **-

82 Methods of inverted file creation  Word indexing  Simple / automatic / no indication required  Loss of word context  A field structure is not required  Phrase indexing  Indication of phrases during input is required  Richer than separate words  A field structure is not required  Field indexing  Simple / automatic / no indication required  Context is better preserved  A field structure is required **-

83 CDS/ISIS inverted file services: display menu *--

84 Automatic indexing (file inversion) Possible? Obligatory? **- Word indexing? with proximity indexing?Word indexing? with proximity indexing? Field indexing?Field indexing? Sub-field indexing?Sub-field indexing? Phrase indexing?Phrase indexing?  Maximum length of index entry?  List of stopwords available?  Immediately after input or in batch? (Slow down...?)  Indexing speed?  Adding prefixes/tags possible?  Modification of indexing possible?

85 !? Question !? Task !? Problem !? Why can the index of a database be so large in comparison with the size of the database? **-

86 CDS/ISIS information retrieval services: display menu *--

87 CDS/ISIS information retrieval: example of a dictionary on the display *--

88 CDS/ISIS: features related to retrieval Browsing Dictionary > Searching /selecting Direct searching + mix of these methods Boolean operators: + OR * AND (^ NOT) Previous search result: #n Field qualifier: search term / (n,m,...) *--

89 CDS/ISIS: an additional, user- friendly search interface program, Heurisko Offers a more limited but more user-friendly interface with drop-down menus,Offers a more limited but more user-friendly interface with drop-down menus, »to choose among available CDS/ISIS databases »to search the chosen database and to display selected records on the video display »to print search results / selections Is available since 1993, free of charge from CDS/ISIS distributors, with a manual.Is available since 1993, free of charge from CDS/ISIS distributors, with a manual. *--

90 Interactive searching of a database / Retrieval Browse in index(es)? + Select from index(es)?Browse in index(es)? + Select from index(es)? Combine search terms?Combine search terms? Proximity operators? (Adjacency / Same paragraph /...)Proximity operators? (Adjacency / Same paragraph /...) Truncated search term(s)?Truncated search term(s)? Limit search to specific field(s)?Limit search to specific field(s)? Highlighting of search terms in selected records?Highlighting of search terms in selected records? Ranking of output?Ranking of output? Speed?Speed? Save search strategy?Save search strategy? **-

91 Output from a database to various “devices” to video displayto video display to printerto printer to computer file (“printing” to a file)to computer file (“printing” to a file)  **-

92 CDS/ISIS output (sorting and printing) services: display menu *--

93 CDS/ISIS printing worksheet: display of an example *--

94 CDS/ISIS sorting worksheet: display of an example *--

95 Formatting of output from a database Formatting aspects / levels  data in each record  lay-out of records on the printed page or in the output computer file  sorting of records in output In CDS/ISIS .PFT file(s)  printing worksheet(s)  sorting worksheet(s) *--

96 Formatting of data within each record in output Independent of output device:Independent of output device: »Determine the sequence of the fields in each record. »Omit specific fields from each record. »Add field names or tags to the fields in each record. »Indicate the search term(s) in each record. Dependent of output device:Dependent of output device: »Specify character formats in each (sub)field: typeface + size + bold/italic/underline **-

97 Sorting / arranging of records in the whole output Can the user determine the sequence of the records?Can the user determine the sequence of the records? Which elements can be used as a basis for sorting?Which elements can be used as a basis for sorting? Can stopwords be omitted as a basis for sorting?Can stopwords be omitted as a basis for sorting? What is the maximum number of sort levels?What is the maximum number of sort levels? Can the user choose between ascending or descending order?Can the user choose between ascending or descending order? Can duplicate records be eliminated? (If yes: Can the user determine the meaning of duplicate?)Can duplicate records be eliminated? (If yes: Can the user determine the meaning of duplicate?) Can output formats (styles) be stored?Can output formats (styles) be stored? **-

98 Advanced and experimental retrieval systems *-- The system accords weights to terms in the databaseThe system accords weights to terms in the database Frequency of occurrence in the database +... Frequency of occurrence in the database +... The searcher accords weights to terms in his queryThe searcher accords weights to terms in his query Based on importance of the term Based on importance of the term Natural language interface between user and systemNatural language interface between user and system The system derives word stems + word meanings +... The system derives word stems + word meanings +... Relevance feedback and query reformulationRelevance feedback and query reformulation User assesses relevance and the system refines query User assesses relevance and the system refines query Dynamic user profile is a part of the systemDynamic user profile is a part of the system System understands the user and his query better System understands the user and his query better

99 Additional programs for CDS/ISIS CDS/ISIS CDS/ISIS Pascal programming language compiler Additional program(s) Source code of additional program(s) in CDS/ISIS Pascal *--

100 Global modification program for CDS/ISIS GMOD.PAS allows the modification of a string in a specific field of all records, throughout the whole CDS/ISIS database. *--

101 Thesaurus program module: purpose Does the database management program offer a thesaurus module which allows the user to create, modify, store, and delete relations between terms used in the database?Does the database management program offer a thesaurus module which allows the user to create, modify, store, and delete relations between terms used in the database? This is mainly used to establish relations among controlled subject indexing terms.This is mainly used to establish relations among controlled subject indexing terms. If more than one controlled vocabulary is used, these should be managed separately.If more than one controlled vocabulary is used, these should be managed separately. **-

102 Structure of a thesaurus database record (Fields for “good” terms) “Good” term“Good” term Controlled vocabulary to which the term belongs (if more than 1 is used in the same database)Controlled vocabulary to which the term belongs (if more than 1 is used in the same database) Scope note (= definition of the controlled term)Scope note (= definition of the controlled term) Date of creation or modification of the termDate of creation or modification of the term NotesNotes **-

103 Structure of a thesaurus database record (Fields for relations) BT (= broader term) term(s) with broader meaningBT (= broader term) term(s) with broader meaning TT (= top term) term highest in the hierarchyTT (= top term) term highest in the hierarchy NT (= narrower term) term(s) with narrower meaningNT (= narrower term) term(s) with narrower meaning RT (= related term) other term(s) related to this oneRT (= related term) other term(s) related to this one UF (= use for) synonym(s)UF (= use for) synonym(s) **-

104 Structure of a thesaurus database record (Fields for forbidden terms) Forbidden termForbidden term US (= use instead) “good” term in the controlled vocabularyUS (= use instead) “good” term in the controlled vocabulary **-

105 Structure of a thesaurus database record (Fields for candidate terms) Candidate “good” term in the controlled vocabularyCandidate “good” term in the controlled vocabulary (Other fields as in the case of “good” terms)(Other fields as in the case of “good” terms) **-

106 Structure of a multilingual thesaurus database record Each type of field in a thesaurus record occurs for each language. **-

107 Thesaurus program: desirable properties (Part 1) Multilingual user interface = menus and messages in more than 1 languageMultilingual user interface = menus and messages in more than 1 language Multilingual contents = terms in more than 1 languageMultilingual contents = terms in more than 1 language When a term in the thesaurus database is added, changed or deleted, the program automatically makes the corresponding changes throughout the whole thesaurus database, there where that term occursWhen a term in the thesaurus database is added, changed or deleted, the program automatically makes the corresponding changes throughout the whole thesaurus database, there where that term occurs The program controls the creation of impossible (= forbidden) or undesirable relationsThe program controls the creation of impossible (= forbidden) or undesirable relations **-

108 Thesaurus program: desirable properties (Part 2) Can the thesaurus contents be formatted and printed or sent to file?Can the thesaurus contents be formatted and printed or sent to file? Can more than 1 thesaurus be managed, linked to the same database?Can more than 1 thesaurus be managed, linked to the same database? Can a thesaurus database can be used with more than 1 primary database?Can a thesaurus database can be used with more than 1 primary database? Can the program signal the presence of orphan terms (= terms without relation)?Can the program signal the presence of orphan terms (= terms without relation)? **-

109 Thesaurus program: integration with input/editing of the primary database How simply and quickly can the user »search the thesaurus during manual input/editing? (for instance to use it as an authority list) »copy a term from a thesaurus and paste into a database record? »copy a term from the database and paste into a thesaurus? »... **-

110 Thesaurus program: integration with searching of the primary database Can the user browse the thesaurus during a search in the database?Can the user browse the thesaurus during a search in the database? Can the program automatically formulate a query, when the user selects terms in the thesaurus module?Can the program automatically formulate a query, when the user selects terms in the thesaurus module? Does the program allow to include easily and quickly synonyms, narrower terms and broader terms in a query?Does the program allow to include easily and quickly synonyms, narrower terms and broader terms in a query? **-

111 Automatic creation, deletion or adaptation of the reciprocal relation Does a change by the user of a relation in one record cause an automatic change by the thesaurus program of the reciprocal relation in the corresponding record of the thesaurus database? Examples: »change of BT changes NT in the corresponding record »change of NT changes BT in the corresponding record »change of RT changes RT in the corresponding record »change of UF changes US in the corresponding record »change of US changes UF in the corresponding record **-

112 Automatic control of the creation of impossible or undesirable relations Does the thesaurus program avoid the creation of impossible or undesirable relations, or does it warn the user? Examples of this kind of relations: »circular hierarchy (a NT b, b NT c, c NT a, or longer) »circular synonym relation (a UF b, b UF a) »iterative synonym relations (a US b, b US c, or longer) »incomplete relations (a RT b, while b does not exist) »term related to itself (for instance: a NT a) »... **-

113 Trilingual thesaurus program module for CDS/ISIS: properties It is an additional program in CDS/ISIS Pascal languageIt is an additional program in CDS/ISIS Pascal language Usage is free of charge, as in the case of CDS/ISISUsage is free of charge, as in the case of CDS/ISIS Thesaurus database management is based on CDS/ISISThesaurus database management is based on CDS/ISIS The thesaurus program, as well as CDS/ISIS, offers a user interface in English, French, and SpanishThe thesaurus program, as well as CDS/ISIS, offers a user interface in English, French, and Spanish The contents of a thesaurus database is trilingual : each term in English, French, and Spanish (each one replaceable by another language)The contents of a thesaurus database is trilingual : each term in English, French, and Spanish (each one replaceable by another language) *--

114 Trilingual thesaurus program for CDS/ISIS: the relations among terms The available relations are: US, UF, NT, BT, TT, RTThe available relations are: US, UF, NT, BT, TT, RT Unlimited number of occurrences for each type of relations in each recordUnlimited number of occurrences for each type of relations in each record After a change of a relation, the program automatically adapts the corresponding relation in the corresponding thesaurus term recordsAfter a change of a relation, the program automatically adapts the corresponding relation in the corresponding thesaurus term records *--

115 Trilingual thesaurus program for CDS/ISIS: control of relations The program avoids the creation of some impossible or undesirable relations: »circular synonym relation (a UF b, b UF a) »iterative synonym relations (a US b, b US c, or longer) »incomplete relations (a RT b, while b does not exist) *--

116 Trilingual thesaurus for CDS/ISIS: integration with searching The user can browse the thesaurus during a search in the primary database.The user can browse the thesaurus during a search in the primary database. The program automatically formulates a query in the primary database, when the user selects terms in the thesaurus module.The program automatically formulates a query in the primary database, when the user selects terms in the thesaurus module. The program allows to include easily and quickly synonyms, narrower terms and broader terms in a query.The program allows to include easily and quickly synonyms, narrower terms and broader terms in a query. The thesaurus database can be used for searching with more than 1 primary database.The thesaurus database can be used for searching with more than 1 primary database. *--

117 Trilingual thesaurus program module for CDS/ISIS: further properties In each record describing a term, a field for a scope note is present.In each record describing a term, a field for a scope note is present. A field for date of term creation is present.A field for date of term creation is present. Several printout formats are included.Several printout formats are included. *--

118 How to obtain the trilingual thesaurus program for CDS/ISIS? the national distributor in your countrythe national distributor in your country UNESCO Headquarters, General Information Programme, 1 rue Miollis, Paris, FranceUNESCO Headquarters, General Information Programme, 1 rue Miollis, Paris, France *--

119 Trilingual thesaurus program module for CDS/ISIS: conclusions - Negative: Not well integrated with the input/editing module of CDS/ISIS + Positive: Exceptionally interesting price/quality ratio *--

120 Security / privacy / protection of databases Password for searchingPassword for searching specific database(s) and / or fields and / or record Password for editingPassword for editing specific database(s) and / or fields and / or records Password for changingPassword for changing »database structure »input and modification work sheets »sort and print formats of data in records »sort and print formats of records in a selection **-

121 Security / privacy / protection provided by DOS DOS can make files »read-only »hidden *--

122 Security / privacy / protection in CDS/ISIS SYSPAR.PAR file (entry 0) asks for a password, which can limit access to a particularSYSPAR.PAR file (entry 0) asks for a password, which can limit access to a particular »database »set of worksheets »set of menus »set of additional CDS/ISIS programs Using the read-only version, named ISISCD.EXE, prevents modifications.Using the read-only version, named ISISCD.EXE, prevents modifications. Menus can be changed or removed to prevent access.Menus can be changed or removed to prevent access. *--

123 Passwords and usage tracking Does the use of passwords linked to users or user groups allow usage tracking by a systems manager?Does the use of passwords linked to users or user groups allow usage tracking by a systems manager? “Usage” = for instance, number and types of search and/or edit actions. This can be useful for studies and system management.This can be useful for studies and system management. **-

124 Data export in the case of CDS/ISIS CDS/ISIS Database Contents Database structure Other CDS/ISIS user with same database structure Other CDS/ISIS user with same database structure “Export” of data Other CDS/ISIS user without database Other CDS/ISIS user without database Other database management system Other database management system “Print” data to file Copy of all database files *--

125 Manual versus batch import of data in a database Information items Manual input Batch input **-

126 Conversion and batch input in the case of a CDS/ISIS database File with database records in ASCII with field tags Fangorn program + Conversion specification file File with records in format of the CDS/ISIS database Import module in CDS/ISIS Records in the CDS/ISIS database *--

127 Format conversion program Fangorn Authors: Besemer and NieuwenhuysenAuthors: Besemer and Nieuwenhuysen Available via anonymous ftp fromAvailable via anonymous ftp from »PCWS1.SCI.SNS.IT »ftp.vub.ac.be in the directory \pub\projects\Docinfo\paul\cursus\isis\ »… *--

128 Specification of a format conversion in the case of Fangorn for CDS/ISIS *--

129 !? Question !? Task !? Problem !? Which software packages for storage and retrieval of structured text do YOU know? **-

130 Microcomputers software packages for for structured text retrieval: examples **-Examples askSamaskSam Bib-SearchBib-Search CAIRSCAIRS Cardbox-PlusCardbox-Plus CDS / ISISCDS / ISIS HeadfastHeadfast IdeaListIdeaList InmagicInmagic Notes (Lotus / IBM) Personal Librarian Pro-Cite Reference Manager Strix STATUS Topic (Verity)...

131 !? Question !? Task !? Problem !? How can you use a word processing program together with a text retrieval system? **-

132 Word processing program to assist a retrieval program  To polish text data before import in the database managed by the retrieval program  To inspect output to printer before real printing  To accept output from the retrieval program for further and better formatting, followed by printing **-

133 !? Question !? Task !? Problem !? Which benefits offers a field structure to databases? **-

134 Field structure in records: benefits concerning input The indication of fields in input worksheets guides the input.The indication of fields in input worksheets guides the input. Default values can be assigned to fields which can avoid errors and can make input faster.Default values can be assigned to fields which can avoid errors and can make input faster. The existence of fields allows control of the contents format of each specific field during input.The existence of fields allows control of the contents format of each specific field during input **-

135 Field structure in records: benefits concerning searching User can limit search to specific fields.User can limit search to specific fields. Field type adds information to contents.Field type adds information to contents. Field-indexing keeps data together in index.Field-indexing keeps data together in index **-

136 Field structure in records: benefits concerning output Field structure makes output easier to understand.Field structure makes output easier to understand. In output, each field can be indicated with tag/prefix.In output, each field can be indicated with tag/prefix. Records can be sorted based on contents of a field.Records can be sorted based on contents of a field. In output, the fields can be sorted in each record.In output, the fields can be sorted in each record. In output, some fields can be omitted.In output, some fields can be omitted **-

137 !? Question !? Task !? Problem !? Besides all the benefits offered by a field structure in a database, which problems does this cause? **-

138 Field structure in records: problems (Part 1) In the short term, it is more expensive and time consuming, than handling less structured data.In the short term, it is more expensive and time consuming, than handling less structured data. Initially, the database manager who wants to create a new database has to make decisions:Initially, the database manager who wants to create a new database has to make decisions: »which fields to create to subdivide the database records, »which field tags or names to use for the internal housekeeping of the database by the chosen database management software package. **-

139 Field structure in records: problems (Part 2) The exchange of data, i.e. importing data in a database, which have been exported from another database, is hindered when the databases structures are not identical or compatible.The exchange of data, i.e. importing data in a database, which have been exported from another database, is hindered when the databases structures are not identical or compatible **-

140 Exchange formats and standards for text database systems Usage and aims:Usage and aims: »to allow efficient exchange of information among databases without loss of structural information »to guide database managers in the creation of a database structure (records divided in fields and subfields) Examples: (MARC = machine readable catalogue)Examples: (MARC = machine readable catalogue) »LC-MARC (=Library of Congress MARC); UNIMARC »Common Communication Format (of UNESCO) »SGML ***

141 Common Communication Format (CCF): description Developed by the Unesco - General Information Programme for international applicationDeveloped by the Unesco - General Information Programme for international application Includes a system of numeric tags indicatingIncludes a system of numeric tags indicating »the location of fields and subfields in the records »the meaning of the fields and subfields **-

142 Common Communication Format (CCF): availability Published and made available free of charge by the Unesco - General Information Programme »Printed manuals »Printed implementation notes »Example CDS/ISIS database structured according to the Common Communication Format **-

143 Exchange of data among systems: requirements Subject thesaurus (relation-structure + contents)Subject thesaurus (relation-structure + contents) Subject classification scheme + level of usageSubject classification scheme + level of usage Contents of fields (and subfields) in the records (in the case of bibliographic databases: cataloguing input rules)Contents of fields (and subfields) in the records (in the case of bibliographic databases: cataloguing input rules) Database structure: records, fields, subfields,... as seen by the database managerDatabase structure: records, fields, subfields,... as seen by the database manager Version of the program for database managementVersion of the program for database management Type of program for database managementType of program for database management Alphabet used for the dataAlphabet used for the data **-

144 Compatibility among databases: an example Library of Congress Subject Headings (LCSH) (a thesaurus)Library of Congress Subject Headings (LCSH) (a thesaurus) Universal Decimal Classification (UDC)Universal Decimal Classification (UDC) Anglo American Cataloguing Rules (AACR)Anglo American Cataloguing Rules (AACR) Common Communication Format (CCF)Common Communication Format (CCF) Version 3.0Version 3.0 CDS/ISIS programCDS/ISIS program Extension of ASCII by IBMExtension of ASCII by IBM ISO standard for record storage ! ISO standard for record storage ! **-Example

145


Download ppt "Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN University Library, Vrije Universiteit Brussel,"

Similar presentations


Ads by Google