INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.

Slides:



Advertisements
Similar presentations
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Session 8 Technical Services Moving from conceptual description to implementation technology.
The NLM Controlled Vocabulary Medical Subject Headings (MeSH) PubMed for Trainers, Spring 2015 U.S. National Library of Medicine (NLM) and NLM Training.
Problem Solving What is AI way of solving problem?
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Requirements Specification
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Thesaurus Design and Development
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Knowledge Representation Reading: Chapter
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
MeSH Vocabulary.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
The Bahrain Branch of the UK Cochrane Centre In Collaboration with Reyada Training & Management Consultancy, Dubai-UAE Cochrane Collaboration and Systematic.
1 MeSH & Principles of Classification April 13, 2005.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
DeCS/MeSH description, uses, services, updating Adalberto Tardelli BIREME/PAHO/WHO GHL Workshop March 27, 2007.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Knowledge representation
Intercod Interactive Learning System
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
MIS 673: Database Analysis and Design u Objectives: u Know how to analyze an environment and draw its semantic data model u Understand data analysis and.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
DeCS/MeSH Description, uses, services, updating Visit of Isabelle Wachsmuth (WHO) and América Valdes (PAHO) BIREME, São Paulo, August 2007.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Medical Subject Headings (MeSH)
INFO Week 7 Indexing and Searching Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Using the thesaurus Audi_insperation; Flickr, Creative Commons Licence: / /
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
MeSH: Medical Subject Headings Anne Allen, Heather Braum, Paula Davidson, Ellen Rose LI 804: Organization of Information.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
CSE 635 Multimedia Information Retrieval
Presentation transcript:

INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University

Effective Information Retrieval Data Structures Data Structures Knowledge Representation Knowledge Representation  From Document representation to Knowledge representation User Interface and User Interaction User Interface and User Interaction

Document Representation Vocabulary Vocabulary Semantics Semantics Implementation Implementation

Vocabulary Controlled Vocabulary Controlled Vocabulary  A list of terms selected for index purpose.  The terms are processed to reduce inconsistence and ambiguity.  Established selection rules and indexing rules Uncontrolled vocabulary Uncontrolled vocabulary  Subject keywords  Metadata

Example: ACM record

Meta Data Data about data Data about data Descriptive Data Descriptive Data  External to the meaning of the document  Dublin Core Metadata Element Set  Author, title, publisher, etc. Semantic Metadata Semantic Metadata  Subject keywords Challenge: automatic generation of metadata for documents Challenge: automatic generation of metadata for documents

Semantics Semantics is the study of meaning Semantics is the study of meaning  Relational semantics  Synonymy, hierarchical, etc.  Referential semantics  Homonyms, techniques used to limited the meanings or referents of terms  Category semantics  Facets or other participations

Example: Mercury? Mercury?  Mercury (car)  Mercury (planet)  Mercury (metal)  Mercury (Greek god)

Implementation Standards Standards  AACR2  ISO Standard for Indexing (ISO 5963)  ISO Standard for Thesaurus Construction (ISO 2788) Rules Rules  Classification rules  Evaluation rules

Subject Indexing A human analytic process for identifying, selecting, and representing document concepts A human analytic process for identifying, selecting, and representing document concepts  Create indexing languages  Using standardized, limited vocabularies for index purposes.  Assign indexing terms to documents  Using only the terms in the index language selected.

Basic Processes of Subject Indexing Identifying concepts which represent the subject and purpose of a document. Identifying concepts which represent the subject and purpose of a document. Deciding which of these concepts are important for retrieval of this document Deciding which of these concepts are important for retrieval of this document Expressing concepts needed for retrieval in the indexing languages used Expressing concepts needed for retrieval in the indexing languages used Using uncontrolled vocabulary for concepts not represented or represented insufficiently specifically in the indexing languages. Using uncontrolled vocabulary for concepts not represented or represented insufficiently specifically in the indexing languages.

Controlled Vocabulary Goals: Goals:  To permit easy locations of documents by topic.  To define topic areas, and hence relate one document to another.  to provide multiple access pointers to documents  to enforce a uniformity throughout an information retrieval system

Controlled Vocabulary Formats: Formats:  Hierarchical Classified list  hierarchical subject descriptors  associative cross references  classification notation (codes)  Alphabetical list  include both descriptors and other lead-in terms

Main Components in a Controlled Vocabulary Keyword/ Descriptor Synonymous Term Broader Term Narrower Term Related Term

Example CancerMalignancy Malignant tumor Cancer morphology Diseases Neoplasms Malignant neoplasm of skins Breast Cancer Primary malignant neoplasm of liver Abdominal Neoplasms Hyperplasia Seminoma Broader Terms Related Terms Narrower Terms Synonyms

Example: MeSH – Medical Subject Headings MeSH – Medical Subject Headings  22,568 descriptors  139,000 headings (Supplementary Concept Records)  thousands of cross-references  i.e., Vitamin C see Ascorbic Acid.  Used t indexing MEDLINE MeSH Browser MeSH Browser MeSH Browser MeSH Browser

MeSH Tree Structures Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

ERIC Thesaurus more than 10,000 terms or subject headings used in indexing and searching ERIC records. more than 10,000 terms or subject headings used in indexing and searching ERIC records. A supplemental list of over 55,000 terms or subject headings including A supplemental list of over 55,000 terms or subject headings including  proper names (e.g., geographic, personal, institutional, project, equipment, test, etc., names) or  concepts not yet represented by the controlled vocabulary of the ERIC Thesaurus.

Controlled Vocabulary Examples: Examples:  Case studies: Descriptor  SN: Details analyses, usually focusing on a particular problem of an individual, group, or organization (note: do not confuse with “medical case histories”  NT: Cross sectional studies Longitudinal studies Longitudinal studies

Examples (Case Studies)  BT Evaluation methods Research Research  RT Case records Counseling Counseling Qualitative research Qualitative research

Advantages of Subject Indexing Facilitates concept search Facilitates concept search  search by topics/subjects, not just by words  link related documents by subject terms  Make implicit information explicit Provides a standard terminology to index and search documents. Provides a standard terminology to index and search documents.  Use small indexing vocabulary  Help the searcher find related terms

Disadvantages of Subject Indexing Expensive manual operations Expensive manual operations  To construct the controlled vocabulary  To assign terms to documents Difficult to keep up to date Difficult to keep up to date  Terminology changes very fast  New terms are added daily. Inconsistent process of human indexing Inconsistent process of human indexing  Same documents are assigned different indexing terms by different indexers  The user may not use the same terms to find documents as the indexer would use to index the documents.

Document Representation Inverted Indexing Inverted Indexing  Represent a document as a list of terms occurred in the document  computer-based indexing  statistical-based indexing Subject Indexing Subject Indexing  Represent a document as a list of subject terms occurred in a controlled vocabulary.

Considerations of Document Representation Any format of document representation needs to maintain a balance of its Any format of document representation needs to maintain a balance of its  Discriminating power  Descriptiveness  Similarity identification  Conciseness

Considerations of DR Discriminating power Discriminating power  to identify a document uniquely  to reduce ambiguity  Examples: ISBN number for bookISBN number for book bar codes for productsbar codes for products

Considerations of DR Descriptiveness Descriptiveness  describe all the information as complete as possible  fulltext  abstracts  extracts  reviews  Completeness and correctness

Considerations of DR Similarity Identification Similarity Identification  to group similar documents  keywords or subject indexing  book classification numbers  Difficulty for the computer to assign keywords, subject descriptors, or classification numbers to documents

Considerations of DR Conciseness Conciseness  simple and clear  reduce process time and storage space  Examples:  authors and titles

Relationships of four considerations Higher discrimination power may lower the capability of identifying similarities among documents. Higher discrimination power may lower the capability of identifying similarities among documents. Good descriptiveness may defeat the conciseness Good descriptiveness may defeat the conciseness What’s good for the computer may not always be good for the user. What’s good for the computer may not always be good for the user. A good representation should seek a balance of the four, and take consideration of both the computer and the user. A good representation should seek a balance of the four, and take consideration of both the computer and the user.

What’s missing in DR? Intelligent Reasoning! Intelligent Reasoning! Knowledge-base Knowledge-base  Ontology  Semantic Networks Uncertainty(impreciseness)-handling Uncertainty(impreciseness)-handling

Knowledge Representation encoding human knowledge - in all its various forms - in such a way that the knowledge can be used. encoding human knowledge - in all its various forms - in such a way that the knowledge can be used.  A successful representation of some knowledge must be in a form that is understandable by humans, and must cause the system using the knowledge to behave as if it knows it.

Knowledge Representation A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself. A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself. It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends. It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends.

Knowledge Representation It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. It is a medium of human expression, i.e., a language in which we say things about the world. It is a medium of human expression, i.e., a language in which we say things about the world.  From rep.html rep.htmlhttp://medg.lcs.mit.edu/ftp/psz/k- rep.html

Intelligent Information Retrieval Information retrieval supported by knowledge representation, rather than document representation. Information retrieval supported by knowledge representation, rather than document representation. Useful links Useful links  Stanford Stanford  Agent-based IR Agent-based IR Agent-based IR