International Atomic Energy Agency November 2009INIS Training Seminar1 INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing 23 – 27 November 2009 Vienna, Austria Alexander Nevyjel Head, Content Management Group
International Atomic Energy Agency November 2009INIS Training Seminar2 Introduction to Subject Analysis Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules) Steps of Subject Analysis subject classification abstracting subject indexing
International Atomic Energy Agency November 2009INIS Training Seminar3 Subject Classification The main topic of the document determines the primary subject category If there are other significant topics, one or more secondary subject categories can be assigned in addition
International Atomic Energy Agency November 2009INIS Training Seminar4 Abstracting Each input item should contain an English abstract (exception: short communications) Abstracts in other languages are optional If an author abstract is available, it should be checked by the subject specialist, and edited, if necessary An abstract should be as informative as possible Emphasize what is novel about the information in the original document
International Atomic Energy Agency November 2009INIS Training Seminar5 Thesaurus „A thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained system language. It is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge“ This definition has been adopted by UNESCO „Guidelines for the establishment and development of monolingual thesauri“, UNESCO, SC/W/255, Paris, September 1973
International Atomic Energy Agency November 2009INIS Training Seminar6 The Thesaurus and its Structure RelationshipSyCross reference hierarchicalBTbroader term (level 1, 2,...) hierarchicalNTnarrower term (level 1, 2,...) affinitiveRTrelated term preferentialUFused for (reciprocally USE...) preferentialUF+used for multiple (reciprocally USE... AND...) preferentialSFseen for (reciprocally SEE... OR...)
International Atomic Energy Agency November 2009INIS Training Seminar7 Subject Indexing Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the Thesaurus Understanding of the content --> subject specialist Familiarity with Thesaurus and indexing rules Select a set of descriptors that describes the subject content of the piece of literature
International Atomic Energy Agency November 2009INIS Training Seminar8 Procedures for Indexing Carefully read the title and abstract and scan the body of the piece of literature scan the full text ( introduction, table of content, tables, graphs, figures, conclusion) to find information items missing from the abstract or requiring more precision Identify the concept(s) about which the piece of literature contains useful information Translate the concepts into descriptors Avoid overindexing
International Atomic Energy Agency November 2009INIS Training Seminar9 Proposed Terms (Technical Note 175) If no suitable descriptor exists in the Thesaurus for the retrieval of a usefull concept, make a proposal for a new one, containing the following: Proposed term Proposed word block of the term (in particular proposed BTs) Potential forbidden terms pointing to this proposed descriptor Scope note when appropriate Explanation and justification for the proposal One or more sample records
International Atomic Energy Agency November 2009INIS Training Seminar10 The purpose of subject indexing is to enable useful retrieval
International Atomic Energy Agency November 2009INIS Training Seminar11 Computer-assisted Indexing - CAI Kick-off MeetingJan 2004 Implementation and Customisation Jun 2004 Production Indexing from Jun 2004 ongoing CAI version 1.0 final acceptance Aug 2004 Tuning of the system from Aug 2004 ongoing CAI batch processing for Member StatesDec 2004 CAI online from remote for MSNov 2007
International Atomic Energy Agency November 2009INIS Training Seminar12 CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. handled similar to “forbidden terms” with one or more USE relations CAI internal only not exported to INIS production system not exported to FIBRE not printed in any appearance of the thesaurus support identification of descriptors in the free text
International Atomic Energy Agency November 2009INIS Training Seminar13 Hidden Terms: Compounds Descriptorhidden termfree text MAGNESIUM BORIDESMgB_2MgB 2 MAGNESIUM CARBONATESMgCO_3MgCO 3 MAGNESIUM HYDRIDESMgH_2MgH 2 IRON BROMIDESiron dibromide IRON BROMIDESiron tribromide ARSENIC IONSAs"3"-As 3- ACETYLENEC_2H_2C 2 H 2 ACETALDEHYDEC_2H_4OC 2 H 4 O ACETIC ACIDC_2H_4O_2C 2 H 4 O 2 approx hidden terms (expected 3000)
International Atomic Energy Agency November 2009INIS Training Seminar14 Hidden Terms: Isotopes Descriptorhidden termfree text CESIUM 137Cesium 137, Cesium-137 "1"3"7cs 137 Cs 137 caesium137 Caesium, 137-Caesium caesium 137Caesium 137, Caesium cesium137 Cesium, 137-Cesium 137 cs137 Cs, 137-Cs s 137Cs 137, Cs-137 cs"1"3"7Cs 137 cs137Cs137 CESIUM 138"1"3"8"mcs 138m Cs cs"1"3"8"mCs 138m approx hidden terms
International Atomic Energy Agency November 2009INIS Training Seminar15 Hidden Terms: Elementary Particles Descriptorhidden termfree text B QUARKSbottom quarks T QUARKStop quarks ELECTRON NEUTRINOS#nu#_e ν e MUON NEUTRINOS#nu#_#mu# ν μ TAU NEUTRINOS#nu#_#tau# ν τ RHO-770 MESONS#rho#-770 ρ-770 OMEGA-782 MESONS#omega#-782 ω-782 KAONS NEUTRALK"0 K 0 KAONS NEUTRAL SHORT-LIVEDK"0_S K 0 S KAONS NEUTRAL LONG-LIVEDK"0_L K 0 L approx. 300 hidden terms
International Atomic Energy Agency November 2009INIS Training Seminar16 Hidden Terms: UK/US Spellings Descriptorhidden term A CENTERSa centres ACTIVITY METERSactivity metres ANALOG COMPUTERSanalogue computers ANESTHESIAanaesthesia ARCHAEOLOGYarcheology AUSTRIAN ORGANIZATIONSaustrian organisations BALLISTIC MISSILE DEFENSEballistic missile defence BAYARD-ALPERT GAGESbayard-alpert gauges BEAM ANALYZERSbeam analysers BEHAVIORbehaviour CATALOGScatalogues approx. 800 hidden terms
International Atomic Energy Agency November 2009INIS Training Seminar17 Hidden Terms: Diacritics and Countries Descriptorhidden term Diacritics: BAECKLUND TRANSFORMATIONbacklund transformation BRUECKNER MODELbruckner model BRUNSBUETTEL REACTORbrunsbuttel reactor MOESSBAUER EFFECTmossbauer effect Country Names: CAMBODIAkampuchea COTE D'IVOIREivory coast GREECEhellas MYANMARburma SYRIAsyrian arab republic THAILANDsiam approx. 250 hidden terms
International Atomic Energy Agency November 2009INIS Training Seminar18 Hidden Terms: Other Spellings Descriptorhidden term Singular/Plural FUNGIfungus FUNGIfunguses G MATRIXg matrices G MATRIXg matrixes Reverse Sequence ATOM-MOLECULE COLLISIONSatom-molecule scattering ATOM-MOLECULE COLLISIONSmolecule-atom scattering ATOM-MOLECULE COLLISIONSatom-molecule reactions ATOM-MOLECULE COLLISIONSmolecule-atom reactions ATOM-MOLECULE COLLISIONSatom-molecule interactions ATOM-MOLECULE COLLISIONSmolecule-atom interactions approx. 900 hidden terms
International Atomic Energy Agency November 2009INIS Training Seminar19 CAI Thesaurus Extension Thesaurus Valid Descriptors Forbidden Terms CAI Hidden Terms Total Terminological Knowledge Base
International Atomic Energy Agency November 2009INIS Training Seminar20 Further Improvements necessary “+” and “-“ signs K + KAONS PLUS, KAONS MINUS, POTASSIUM IONS Case sensitivity TiN TIN (instead of TITANIUM NITRIDES) gas GALLIUM SULFIDES “…who is the …” WHO (World Health Organization) Verbs versus Nouns “… this leads us to …” LEAD “… this leaves it ….” LEAVES Homographic terms Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS Nuclear Reactions, e.g. 14 N(γ,α) 10 B Targets Beams Reactions
International Atomic Energy Agency November 2009INIS Training Seminar21 CAI-Workflow Interactive CAI Processing Batch Mode Conventional Processing
International Atomic Energy Agency November 2009INIS Training Seminar22
International Atomic Energy Agency November 2009INIS Training Seminar23 CAI Batch and Online Processing Input:MemSt-CC-yymmdd-xxxxxxxxxxx MemSt is a standard prefix (meaning “member state”) CC is the country code yymmdd is the date when the file was generated xxxxxxxxxxx is any additional identification Examples MemSt-AR thisismytestfile MemSt-FR fileidentification
International Atomic Energy Agency November 2009INIS Training Seminar24 CAI Batch Processing Output:_MemSt-CC-yymmdd-xxxxxxxxxxx These files will carry the CAI suggested descriptors in tag 800, preceded by the string ##CAI suggestions##; Example: 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; ……. sent back to the member state for reviewing
International Atomic Energy Agency November 2009INIS Training Seminar25 CAI Batch and Online Processing Reviewing Process Delete all suggested descriptors which are too general Add relevant descriptors which were not found numerical values, e.g. pressure ranges, temperature ranges,... nuclear reactions chemical compounds, alloys, etc. CAI is cleaning up BT/NTs clean up BT/NTs from manual additions Clean up suggestions from homographic terms
International Atomic Energy Agency November 2009INIS Training Seminar26 CAI Batch and Online Processing Finalisation Process CAI batch When reviewing of the record completed: Delete “##CAI suggestions## “ When reviewing of all records completed: Submit file to “INIS Input Box” CAI online When reaching the last record: press “export and exit” button File goes directly to INIS production system, or if required, sent back to Member State for reviewing
International Atomic Energy Agency November 2009INIS Training Seminar27 CAI Production Statistics until CAI Production Statistics ( until ) Total Jun-DecJan-Aug AIP ANS Elsevier IOPP IAEA Springer MemSt Total
International Atomic Energy Agency November 2009INIS Training Seminar28 CAI Batch Processing Statistics 2005 until /1-8Total AR AU224 BG CN DE ET FR JP LT MY US UZ VN others Total
International Atomic Energy Agency November 2009INIS Training Seminar29 CAI online for Member States introduced in July 2007 Tested by China Germany France India Japan Switzerland Uruguay Regularly in use by Argentina Brazil China Czech Republic Japan Switzerland CAI online and CAI batch are now regular services for Member States