Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.

Slides:



Advertisements
Similar presentations
Technology Roadmap Project Harold Flescher VP-Elect, Technical Activities August 2008, Region 1 Meeting.
Advertisements

Metadata and Search at Boeing Julie Martin Library & Learning Center Services
Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
SciTech Strategies, Inc. William Pickering Dick Klavans Marjorie M.K. Hlava IEEE SciTech Strategies Access Innovations / Data Harmony March 23, 2010 Found.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
Engineering Village ™ ® Basic Searching On Compendex ®
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Modern Information Retrieval Chapter 1: Introduction
Article Database Tutorial (and quick guide to library resources)
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Article Database Tutorial (and quick guide to library resources)
ISYS3015 Analytical Methods for Information systems professionals Week 3 Lecture 1: Finding the literature.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
1. 2 Module 7 Content and knowledge Management Objectives To provide basic concepts and knowledge of Content Management to CIOs and explore the applicability.
Introduction to Current Contents Connect. What is CCC? A multidisciplinary current awareness resource –Browse and search journals, books and websites.
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
ISI Web of Knowledge Service for UK Education
JUMPSTART YOUR DISSERTATION TIME SAVING METHODS FOR SEARCHING AND CITING.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
Semantic Learning Instructor: Professor Cercone Razieh Niazi.
Article Database Tutorial (and quick guide to library resources)
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Jennifer Widom XML Data Introduction, Well-formed XML.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
Controlled Vocabulary & Thesaurus Design Course Introduction and Background.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 Value of Taxonomies in Knowledge Management Joe Schehr VP Knowledge Management and Technology Solutions LexisNexis.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
APS Taxonomy Project Arthur Smith, American Physical Society April 2014.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
IRS Tax Map Electronic Research Tool David Brown Internal Revenue Service Media and Publications Division David Brown Internal Revenue Service Media and.
Oxlip+. What is Oxlip+? A tool for finding & linking to databases – Online collections of (scholarly) materials – Includes full text / indexes / range.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Information Organization
Genomics research paper presentation
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Techniques and Advanced tools for Researchers
XML Data Introduction, Well-formed XML.
Extended responses Learning Intention: To understand how to attack and write an extended response.
PREMIS Tools and Services
IL Step 3: Using Bibliographic Databases
Introduction to Information Retrieval
Presentation transcript:

Indexing Knowledge Daniel Vasicek 2014 March 27

Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples

Basic Ideas Concepts instead of key words – Thesauri instead of key words – Recognize Emerging concepts – Classification Facilitate communication between environments (Data translation) Meta data for publications (xml, sql, txt) – Indexing information

Topics to Cover Programming language constructs needed. What functionality do we need? What people pay Access Innovations to do? Typical programming problems that I encounter.

Input Data Formats – XML tagged meta data for publications – SQL data base – RAW text – Pictures of text Quantities – AIP 304,910 authors as xml files 807,005 xml files containing title, abstract, +meta data – Nicem (National Information Center for Educational Media) 503,534 xml files describing available educational media 26,144 xml files describing suppliers of educational media

Programming Languages Used Visual Basic (1990s) C++ Java (currently)

Who Cares? AIP – American Institute of Physics (17 journals + conference proceedings) IEEE- Institute of Electronic and Electrical Engineers (journals, standards, patents, …) SPIE- International Society for Optics and Photonics ACM – Association of Computing Machinery Wolters-Klewer Pub-Med

More Clients Parliament of Victoria (5000 articles per day) JSTOR (~10 million documents, some journals back to 1665) PLOS (quick path to electronic publication) Dupont DOW Council of Europe Triumph Learning ASCE, SAGE, SafetyLit, OSA, NICEM, NPR …

Useful Tools Controlled Vocabulary – an organizational tool for capturing concepts Proximity – a tool for capturing context Hash Table (Content Addressable Array) – Convenience – Uniqueness – Fast access Regular Expressions

What’s a taxonomy? Knowledge organization system Words – Controlled vocabulary for a subject area Descriptive labels Hierarchy – Simple hierarchical view of a thesaurus Storage and retrieval aid

Thesaurus Elements Hierarchy – Broader and Narrower concepts – Multiply connected “treelike” structure Nodes in the thesaurus structure contain descriptions of concepts and links to broader, narrower, related, and similar concepts Subject specific?

Structure of Controlled Vocabularies Flat List Synonym Ring Taxonomy Thesaurus Ontology INCREASING MEANING and CONTROL Ambiguity Synonym Ambiguity Synonym Hierarchy Relationships Synonym Hierarchy Additional Types of Relationships Hierarchy After ANSI/NISO Z , Figure 5

Synonym Narrower Term Science of Life Broader Term Science Biology Thesaurus Node (Term)

Thesaurus Implementation Terms (Concepts, Preferred Terms) Broader Terms Narrower Terms Related Terms Other Concepts – Synonyms – History – Responsibility – Backup Rules to help identify the concept in text Methods for maintaining the thesaurus

Thesaurus Text Representation Biology Science Science of Life Science Biology Science of Life

Thesaurus Problems Missing Terms - pointer links to a term that is not present Broken loops – Narrower term without matching broader term – Broader term without matching narrower term – Related term without a matching return relationship

Proximity of Words Adjacent – Before – After Same sentence Same Paragraph Within 50 words Phrases (n-Grams)

Content Addressable Array T[“Science”]=1; T[“Biology”]=1; T[“Science of Life”]=1; BT[“Biology”] = “Science”; NT[“Science”] = “Biology”; UF[“Science of Life”]=“Biology”;

Regular Expressions 9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,4})$/ – addresses? / [A-Z][a-z]* / – Capitalized words /[A-Z][a-zA-Z0-9,\”\- ]*\. / – Sentence ? Paragraph?

Structure of Controlled Vocabularies Flat List Synonym Ring Taxonomy Thesaurus Ontology INCREASING MEANING and CONTROL Ambiguity Synonym Ambiguity Synonym Hierarchy Relationships Synonym Hierarchy Additional Types of Relationships Hierarchy After ANSI/NISO Z , Figure 5