Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Atomic Energy Agency November 2009INIS Training Seminar1 INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing.

Similar presentations


Presentation on theme: "International Atomic Energy Agency November 2009INIS Training Seminar1 INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing."— Presentation transcript:

1 International Atomic Energy Agency November 2009INIS Training Seminar1 INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing 23 – 27 November 2009 Vienna, Austria Alexander Nevyjel Head, Content Management Group

2 International Atomic Energy Agency November 2009INIS Training Seminar2 Introduction to Subject Analysis Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules) Steps of Subject Analysis subject classification abstracting subject indexing

3 International Atomic Energy Agency November 2009INIS Training Seminar3 Subject Classification The main topic of the document determines the primary subject category If there are other significant topics, one or more secondary subject categories can be assigned in addition

4 International Atomic Energy Agency November 2009INIS Training Seminar4 Abstracting Each input item should contain an English abstract (exception: short communications) Abstracts in other languages are optional If an author abstract is available, it should be checked by the subject specialist, and edited, if necessary An abstract should be as informative as possible Emphasize what is novel about the information in the original document

5 International Atomic Energy Agency November 2009INIS Training Seminar5 Thesaurus „A thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained system language. It is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge“ This definition has been adopted by UNESCO „Guidelines for the establishment and development of monolingual thesauri“, UNESCO, SC/W/255, Paris, September 1973

6 International Atomic Energy Agency November 2009INIS Training Seminar6 The Thesaurus and its Structure RelationshipSyCross reference hierarchicalBTbroader term (level 1, 2,...) hierarchicalNTnarrower term (level 1, 2,...) affinitiveRTrelated term preferentialUFused for (reciprocally USE...) preferentialUF+used for multiple (reciprocally USE... AND...) preferentialSFseen for (reciprocally SEE... OR...)

7 International Atomic Energy Agency November 2009INIS Training Seminar7 Subject Indexing Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the Thesaurus Understanding of the content --> subject specialist Familiarity with Thesaurus and indexing rules Select a set of descriptors that describes the subject content of the piece of literature

8 International Atomic Energy Agency November 2009INIS Training Seminar8 Procedures for Indexing Carefully read the title and abstract and scan the body of the piece of literature scan the full text ( introduction, table of content, tables, graphs, figures, conclusion) to find information items missing from the abstract or requiring more precision Identify the concept(s) about which the piece of literature contains useful information Translate the concepts into descriptors Avoid overindexing

9 International Atomic Energy Agency November 2009INIS Training Seminar9 Proposed Terms (Technical Note 175) If no suitable descriptor exists in the Thesaurus for the retrieval of a usefull concept, make a proposal for a new one, containing the following: Proposed term Proposed word block of the term (in particular proposed BTs) Potential forbidden terms pointing to this proposed descriptor Scope note when appropriate Explanation and justification for the proposal One or more sample records

10 International Atomic Energy Agency November 2009INIS Training Seminar10 The purpose of subject indexing is to enable useful retrieval

11 International Atomic Energy Agency November 2009INIS Training Seminar11 Computer-assisted Indexing - CAI Kick-off MeetingJan 2004 Implementation and Customisation Jun 2004 Production Indexing from Jun 2004 ongoing CAI version 1.0 final acceptance Aug 2004 Tuning of the system from Aug 2004 ongoing CAI batch processing for Member StatesDec 2004 CAI online from remote for MSNov 2007

12 International Atomic Energy Agency November 2009INIS Training Seminar12 CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. handled similar to “forbidden terms” with one or more USE relations CAI internal only not exported to INIS production system not exported to FIBRE not printed in any appearance of the thesaurus support identification of descriptors in the free text

13 International Atomic Energy Agency November 2009INIS Training Seminar13 Hidden Terms: Compounds Descriptorhidden termfree text MAGNESIUM BORIDESMgB_2MgB 2 MAGNESIUM CARBONATESMgCO_3MgCO 3 MAGNESIUM HYDRIDESMgH_2MgH 2 IRON BROMIDESiron dibromide IRON BROMIDESiron tribromide ARSENIC IONSAs"3"-As 3- ACETYLENEC_2H_2C 2 H 2 ACETALDEHYDEC_2H_4OC 2 H 4 O ACETIC ACIDC_2H_4O_2C 2 H 4 O 2  approx. 1400 hidden terms (expected 3000)

14 International Atomic Energy Agency November 2009INIS Training Seminar14 Hidden Terms: Isotopes Descriptorhidden termfree text CESIUM 137Cesium 137, Cesium-137 "1"3"7cs 137 Cs 137 caesium137 Caesium, 137-Caesium caesium 137Caesium 137, Caesium-137 137 cesium137 Cesium, 137-Cesium 137 cs137 Cs, 137-Cs s 137Cs 137, Cs-137 cs"1"3"7Cs 137 cs137Cs137 CESIUM 138"1"3"8"mcs 138m Cs cs"1"3"8"mCs 138m  approx. 22.400 hidden terms

15 International Atomic Energy Agency November 2009INIS Training Seminar15 Hidden Terms: Elementary Particles Descriptorhidden termfree text B QUARKSbottom quarks T QUARKStop quarks ELECTRON NEUTRINOS#nu#_e ν e MUON NEUTRINOS#nu#_#mu# ν μ TAU NEUTRINOS#nu#_#tau# ν τ RHO-770 MESONS#rho#-770 ρ-770 OMEGA-782 MESONS#omega#-782 ω-782 KAONS NEUTRALK"0 K 0 KAONS NEUTRAL SHORT-LIVEDK"0_S K 0 S KAONS NEUTRAL LONG-LIVEDK"0_L K 0 L  approx. 300 hidden terms

16 International Atomic Energy Agency November 2009INIS Training Seminar16 Hidden Terms: UK/US Spellings Descriptorhidden term A CENTERSa centres ACTIVITY METERSactivity metres ANALOG COMPUTERSanalogue computers ANESTHESIAanaesthesia ARCHAEOLOGYarcheology AUSTRIAN ORGANIZATIONSaustrian organisations BALLISTIC MISSILE DEFENSEballistic missile defence BAYARD-ALPERT GAGESbayard-alpert gauges BEAM ANALYZERSbeam analysers BEHAVIORbehaviour CATALOGScatalogues  approx. 800 hidden terms

17 International Atomic Energy Agency November 2009INIS Training Seminar17 Hidden Terms: Diacritics and Countries Descriptorhidden term Diacritics: BAECKLUND TRANSFORMATIONbacklund transformation BRUECKNER MODELbruckner model BRUNSBUETTEL REACTORbrunsbuttel reactor MOESSBAUER EFFECTmossbauer effect Country Names: CAMBODIAkampuchea COTE D'IVOIREivory coast GREECEhellas MYANMARburma SYRIAsyrian arab republic THAILANDsiam  approx. 250 hidden terms

18 International Atomic Energy Agency November 2009INIS Training Seminar18 Hidden Terms: Other Spellings Descriptorhidden term Singular/Plural FUNGIfungus FUNGIfunguses G MATRIXg matrices G MATRIXg matrixes Reverse Sequence ATOM-MOLECULE COLLISIONSatom-molecule scattering ATOM-MOLECULE COLLISIONSmolecule-atom scattering ATOM-MOLECULE COLLISIONSatom-molecule reactions ATOM-MOLECULE COLLISIONSmolecule-atom reactions ATOM-MOLECULE COLLISIONSatom-molecule interactions ATOM-MOLECULE COLLISIONSmolecule-atom interactions  approx. 900 hidden terms

19 International Atomic Energy Agency November 2009INIS Training Seminar19 CAI Thesaurus Extension Thesaurus Valid Descriptors21.826 Forbidden Terms 9.009 CAI Hidden Terms34.381 Total 65.216  Terminological Knowledge Base

20 International Atomic Energy Agency November 2009INIS Training Seminar20 Further Improvements necessary “+” and “-“ signs K +  KAONS PLUS, KAONS MINUS, POTASSIUM IONS Case sensitivity TiN  TIN (instead of TITANIUM NITRIDES) gas  GALLIUM SULFIDES “…who is the …”  WHO (World Health Organization) Verbs versus Nouns “… this leads us to …”  LEAD “… this leaves it ….”  LEAVES Homographic terms Solutions  SOLUTIONS or MATHEMATICAL SOLUTIONS Nuclear Reactions, e.g. 14 N(γ,α) 10 B Targets Beams Reactions

21 International Atomic Energy Agency November 2009INIS Training Seminar21 CAI-Workflow Interactive CAI Processing Batch Mode Conventional Processing

22 International Atomic Energy Agency November 2009INIS Training Seminar22

23 International Atomic Energy Agency November 2009INIS Training Seminar23 CAI Batch and Online Processing Input:MemSt-CC-yymmdd-xxxxxxxxxxx MemSt is a standard prefix (meaning “member state”) CC is the country code yymmdd is the date when the file was generated xxxxxxxxxxx is any additional identification Examples MemSt-AR-041203-thisismytestfile MemSt-FR-041212-fileidentification

24 International Atomic Energy Agency November 2009INIS Training Seminar24 CAI Batch Processing Output:_MemSt-CC-yymmdd-xxxxxxxxxxx These files will carry the CAI suggested descriptors in tag 800, preceded by the string ##CAI suggestions##; Example: 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; ……. sent back to the member state for reviewing

25 International Atomic Energy Agency November 2009INIS Training Seminar25 CAI Batch and Online Processing Reviewing Process Delete all suggested descriptors which are too general Add relevant descriptors which were not found numerical values, e.g. pressure ranges, temperature ranges,... nuclear reactions chemical compounds, alloys, etc. CAI is cleaning up BT/NTs  clean up BT/NTs from manual additions Clean up suggestions from homographic terms

26 International Atomic Energy Agency November 2009INIS Training Seminar26 CAI Batch and Online Processing Finalisation Process CAI batch When reviewing of the record completed: Delete “##CAI suggestions## “ When reviewing of all records completed: Submit file to “INIS Input Box” CAI online When reaching the last record: press “export and exit” button File goes directly to INIS production system, or if required, sent back to Member State for reviewing

27 International Atomic Energy Agency November 2009INIS Training Seminar27 CAI Production Statistics 01-06-2004 until 31-08-2009 CAI Production Statistics (01-06-2004 until 31-08-2009) 2004 2005200620072008 2009 Total Jun-DecJan-Aug AIP19859178271955796578249410879257 ANS 8131256 2069 Elsevier31242380935716321752699318625140442 IOPP329187518059797310526835546955 IAEA21312171398444454843253220106 Springer 611310007113 MemSt 66065304531056875 Total284055255868789555715976937725302817

28 International Atomic Energy Agency November 2009INIS Training Seminar28 CAI Batch Processing Statistics 2005 until 31-08-2009 20052006200720082009/1-8Total AR141453 198 AU224 BG32 19915143425 CN299231923142959305910950 DE36364410198796073512 ET 130179186406226265 FR138721 859 JP11 32 43 LT 3969 108 MY13327020511261781 US 9746 143 UZ35939643 798 VN816 8382189 others306105 411 Total201446111696513402791444906

29 International Atomic Energy Agency November 2009INIS Training Seminar29 CAI online for Member States introduced in July 2007 Tested by China Germany France India Japan Switzerland Uruguay Regularly in use by Argentina Brazil China Czech Republic Japan Switzerland CAI online and CAI batch are now regular services for Member States


Download ppt "International Atomic Energy Agency November 2009INIS Training Seminar1 INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing."

Similar presentations


Ads by Google