SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA.

Slides:



Advertisements
Similar presentations
EDUCATION DATABASES: OVERVIEW. Primary Journal Databases Available for Education Education specific: ProQuest Education Journals Professional Development.
Advertisements

Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
TYPES OF RESEARCH TYPES OF RESEARCH Dr. Ali Abd El-Monsif Thabet.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Lecture №2 State System of Scientific and Technical Information.
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Information Retrieval February 24, 2004
Thesaurus Design and Development
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
How do I know the differences and uses of keyword versus subject searching in a database?
Indexing Overview Approaches to indexing Automatic indexing Information extraction.
Knowledge Organization By C.RANGANATHAN. Basic Concepts and Terminology Subject: Subject refers to ‘an organized systematized body of ideas, whose extension.
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Formulating objectives, general and specific
Why classification matters The foundations of bibliographic classification.
ACTIVE RECORDS MANAGEMENT
An Introduction to Research Methodology
1 MeSH & Principles of Classification April 13, 2005.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Copyright © Allyn & Bacon 2008 Locating and Reviewing Related Literature Chapter 3 This multimedia product and its contents are protected under copyright.
Chapter 3 Copyright © Allyn & Bacon 2008 Locating and Reviewing Related Literature This multimedia product and its contents are protected under copyright.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Modern Information Retrieval Computer engineering department Fall 2005.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Subject Analysis What’s it all about, Alfie? LIB 630 Classification and Cataloging Spring 2010.
Current Events and Issues Using Index Databases for Finding Answers.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
LIS 6771 Indexing with a Controlled Vocabulary Basic Concepts.
1.  Interpretation refers to the task of drawing inferences from the collected facts after an analytical and/or experimental study.  The task of interpretation.
Question paper 1997.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Kemp Library See this presentation any time!
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
N ational Q ualifications F ramework N Q F Quality Center National Accreditation Committee.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Leacock, Warrican & Rose (2009) Reviewing Literature Presentation 4.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Islamic University Nursing College.  A literature review involves the systematic identification, location, search, and summary of written materials that.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Organization of Information LSIS Summer II (2005)
GUIDE. P UB M ED
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
Review of Related Literature
Text Based Information Retrieval
Using computers to search electronic databases
Concept of a document Lesson 3.
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
Chapter 2 Database Environment.
FRAD: Functional Requirements for Authority Data
INDEXING TECHNIQUES The process of constructing document surrogates or document representations is called as Subject Indexing. Indexing has to specify.
CS 430: Information Discovery
Advanced search techniques in databases
Chapter Two: Review of the Literature
Introduction to Information Retrieval
Describing Documents Ch3 in textbook Organizing Knowledge: An
Chapter Two: Review of the Literature
Presentation transcript:

SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

INTRODUCTION/REVIEW One of the major functions of an information retrieval system is to match the contents of documents with users queries. The system personnel have to prepare a surrogate for every document, and all such surrogates must be maintained in an organized manner. (indexing).

INTRODUCTION/REVIEW TASK: analyze the content of the given document and represent this analysis by some content identifiers or keywords. Lancaster: indexing involves two quite distinct contents. Conceptual analysis and representation. In subject classification, the basic objective of which is to arrange documents according to their subject contents, the result of the conceptual analysis is represented by some artificial analysis is represented by some artificial language or notational symbolnotational symbol

Subject Analysis What’s it all about, Garry?

5 What is it? Subject analysis Subject analysis  Examination of a bibliographic item by a trained subject specialist to determine the most specific subject heading(s) or descriptor(s) that fully describe its content, to serve in the bibliographic record as access points in a subject search of a library catalog, index, abstracting service, or bibliographic database. When no applicable subject heading can be found in the existing headings list or thesaurus of indexing terms, a new one must be created.

What is it? It means the presence, identification and expression of subject matter in document texts, databases, controlled and natural languages, information requests and search strategies.

7 Say what?

8 Why do all that? If we don’t we can’t find stuff!  “Subject analysis is [essentially] all methods and processes which can be described as representation for retrieval of information by its subjects, be they names, geographic locations, or topical subjects.”  Quoted from Williamson, N. J. (1997). The Importance of Subject Analysis in Library and Information Science Education. Technical Services Quarterly 15(1/2):67-87 by Pamela Hill in LS 500 Organization of Information Tuesday, February 24, 2004LS 500 Organization of Information

9 Why use a standardized list? Why Subject Headings?  Subject headings often indicate the contents of books in terms that their titles do not use, which often may be nondescriptive or very general. Subject headings in online databases are often referred to as descriptors, but they serve the same purpose in locating valuable resources.  Along with their subdivisions, subject headings provide a clear and systematic way of scanning the catalog for what is needed. Assigned headings are usually the dominant, and most important, subjects of a given item.  Subject headings bring like materials together, requiring less use of the wide variation of synonomous terms that may appear to describe a single concept (teen, youth, adolescent, young adult, etc.). Using Subject Headings in PantherCat

BS factors in choosing subject of document. Does the document deal with a specific product condition or phenomenon? Does the subject contain an action concept, an operation or a process? Is the object or patient affected by the action identified? Does the document deal with the agent of this action?

BS factors in choosing subject of document Does it refer to a particular means for accomplishing the action Were these factors considered in the content of a particular location or environment? Are any independent or dependent variables identified? Was the subject considered from a special viewpoint not normally associated with that field of study.

SUBJECT INDEXING is the act of describing a document by index terms to indicate what the document is about or to summarize its content. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents (such as books and articles) within a field of knowledge. Subject indexing systems have been classified broadly as pre-coordinate and post-coordinate systems. The major objective of any indexing system is to represent the contents of documents through keywords or descriptors

Exhaustively and Specificity An exhaustive index is one which lists all possible index terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man hours are required. The specificity describes how closely the index terms match the topics they represent. An index is said to be specific if the indexer uses parallel descriptors to the concept of the document and reflects the concepts precisely

Recall vs Precision Number of relevant documents retrieved Precision = Total number of documents retrieved Number of relevant documents retrieved Recall = Number of relevant documents in the collection

Manual indexing Analysis of subject Identification of keywords Standardization of keywords Choice of an indexing system  If the chosen system is a post – coordinate one then  Preparation of entries under each term with reference to the document identification number.  Preparation of reference entries.

Manual indexing  If the chosen system is a pre-coordinate one then:  Preparation of an entry (main entry) using all the keywords organized in a way prescribed by the system.  Preparation of index entries by using each significant term as an entry element and the full entry (main entry) as the context, or by rotation/permutation of the significant terms in the main entry according to the rules prescribed by the system chosen.  Preparation of reference entries. filing entries

STEPS IN MANUAL INDEXING SYSTEM

Pre – coordinate indexing system Chain indexing Dr. S.R. Ranganathan developed a method a pre-coordinate indexing. It attempts to represent, in natural language, the chain of concept’s that constitutes a subject

Pre – coordinate indexing system Basic steps in chain indexing may be represented as follows:  Take the class number prepared for the given document.  Consult the corresponding classification schedule and write the notation at each step and the correspondence term or phrase (from the schedule). This will produce a chain of concepts from the general to the specific.

Basic steps in chain indexing may be represented as follows: Identify the sought, unsought, and false links. Sought links denote the concepts that the user is likely to use as access points; unsought links are those that are not likely to be used as access points, and false links are those that really do not represent any valid concepts. Invert the chain, and this will generate the index entries.chain

Pre – coordinate indexing system Relational indexing  J.E.L. Farradane devised a scheme. The system was developed first in the early 1950s and has been modified several times since then. The latest changes may be noted from Farradane’s own papers that appeared in According to Farradane, any subject can be represented by identifying and representing in the form of what he called analets (pairs of terms interposed by an operator), the relationship between each pair of the contituent concepts, and he suggested that any possible relationship can be represented by either of these nine relational operators.

Pre – coordinate indexing system PRECIS – PREserved contect Index System.  Developed by Derek Austin and first came out in Major tasks:  Analysing the document concerned and identifying key concepts.  Organizing the concepts into a subject statement based on the principle of context dependency.  Assigning codes (operators) which signify the syntactical function of each term  Deciding which terms should be the access points and which terms would be in other positions in the index entries, and assigning further codes to achieve these results.  Adding further prepositions, auxiliaries or phrases which would result in clarity and expressiveness of the resulting index entries.  Making supporting reference entries from semantically elated terms taken from a thesaurus.

Pre – coordinate indexing system POPSI, Postulated – based Permuted Subject Indexing  Developed by Bhattacharyya. It uses the anytico- synthetic method for string formulation and permutation of the constituent terms in order to satisfy different approach points to the document.  There are two parts- the lead heading, which contains the index term or the access term, the context heading, which generally appears in the line following the lead heading and contains the subject words, with auxiliary words, denoting the context in which the lead term has been discussed in the given document.

Rules that govern POSI A manifestation of property follows immediately the manifestation in relation to which it is a property. A manifestation of action follows immediately the manifestation in relation to which it is an action Property and action can have another property and/or action directly related. A species or part follows immediately the manifestation in relation to which it process part, and part is used to denote the whole part relationship A modifier follows immediately the manifestation in relation to which it is a modifier.

Post – coordinate indexing system Uniterm  Developed by Mortimer Taube in A card is prepared for each term that is considered to be an appropriate index term for a given document. It relies on the ability of the searcher to notice matching numbers on the cards that are retrieved. Optical coincidence/peek-a-boo cards  Developed to overcome the problem of manual searching. This is based on each card is divided into small units of numbered squares, each unit bearing a specific number, and a document number is punched on the appropriate position on the card.

PROBLEMS OF MANUAL INDEXING Salton and Salton and McGill two major shortcomings:  It is not quite clear that all the complexities and refinements, exemplified by the categorization of terms and assignment of relations between terms, are really beneficial.  It that even if the indexing process is carried out accurately, and at the right level of detail, it is not possible to maintain consistency since more than one indexer will be needed in practice.

Theory of indexing 1 st level: is concordance, which consist of references to all words in the original text arranged in alphabetical order. 2 nd level: information theoretical level, which calculates the likelihood of a word being chosen for indexing based on its frequency of occurrence in a given text document. 3 rd level: linguistic one, which attempts to explain how meaningful words are extracted from large units of text. 4 th level: textual or skeletal framework, the text is prepared by the author in an organized manner and held together by a skeletal structure. 5 th level: inferential level. An indexer should be able to make inferences about the relationships between words and phrases by observing the sentence and paragraph structure, and by strippping the sentence of extraneous details.

Fugmann proposes theory based on axioms Axiom of definability, proposes that compiling information relevant to a topic can only be accomplished to the degree to which a topic can be defined. Axiom of order, suggests that any compilation of information relevant to a topic is an order creation process. Axiom of the sufficient degree of order, that demands made on the degree of order increase as the size of a collection and frequency of searches increase. Axiom of predictability, the success of any directed search for relevant information hinges on how readily predictable or reconstructible are the modes of expression for concepts and statements in the search file. Axiom of fidelity, equates the success of any directed search for relevant information with the fidelity with which concepts and statements are expressed in the search file.

29