Part 3B: Text Indexing, Term Lists & Taxonomies

Slides:



Advertisements
Similar presentations
Technology Guide 3 Data and Database T3-1. IT for Management Prof. Efraim Turban T3-2 File Management Hierarchy of data for a computer-based file Record.
Advertisements

6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
SchemaServer Overview Tools for Enterprise Metadata Management and Synchronization Prepared for the University of Washington Information School Applied.
Toward Automatic Processing and Indexing of Microfilm.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
§6 B+ Trees 【 Definition 】 A B+ tree of order M is a tree with the following structural properties: (1) The root is either a leaf or has between 2 and.
Copyright © 2003 by Prentice Hall Module 4 Database Management Systems 1.What is a database? Data hierarchy and data organization Field, record, file,
Systems analysis and design, 6th edition Dennis, wixom, and roth
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Architecture for a Database System
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Dialog Databases Structure & Indexing Dr. Dania Bilal IS 530 Fall 2009.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
+ Information Systems and Databases 2.2 Organisation.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
LIS 6771 Indexing with a Controlled Vocabulary Basic Concepts.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Introduction to Databases Angela Clark University of South Alabama.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
3.1 CSC 102 Introduction to Information Systems Databases.
CSCI-235 Micro-Computers in Science Databases. Database Concepts Data is any unorganized text, graphics, sounds, or videos A database is a collection.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
SchemaLogic Workshop Tools for Enterprise Metadata Management and Synchronization Prepared for the University of Washington Information School Applied.
It takes 88 days for Mercury to orbit the Sun. This is 0.2 years less days to orbit the Sun than Earth.
Controlled Vocabularies Ilia State University, July 2010 Elisabeth Jijavadze, Natia Gabrichidze 1.
The Solar System. Mercury Mercury is the closest planet to the sun. Mercury is the closest planet to the sun.
Why indexing? For efficient searching of a document
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Microsoft Office Access 2010 Lab 3
COMP6215 Semantic Web Technologies
The Solar System By: Your Name.
Denielia C. Oden Rolling Hills Elementary School Second Grade
Databases Chapter 16.
Data Structures & File Processing
ece 627 intelligent web: ontology and beyond
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Introduction to Information Technology
Planet Order Create an easy way to remember the names of the planets in order from the Sun. Make up a silly sentence. Each word in the sentence should.
Federated & Meta Search
Databases.
Database Vocabulary Terms.
Database & Record Structure
Diane Vizine-Goetz OCLC Research
LECTURE 34: Database Introduction
Introduction to Semantic Metadata & Semantic Web
Far and Away.
Transportation Research Thesaurus:
Advanced search techniques in databases
CSE 635 Multimedia Information Retrieval
Data Models.
logical design for relational database
Solar System.
The Planets of our Solar System The Terrestrial Planets
Solar System.
The Solar System.
LECTURE 33: Database Introduction
The Solar System By: Your Name.
Information Retrieval and Web Design
Recuperação de Informação
Planets in Solar System
THESAURUS CONSTRUCTION: GROUND WATER
Mercury (Type unique newsletter title here)
Presentation transcript:

Part 3B: Text Indexing, Term Lists & Taxonomies

Value space continuum of expressivity… Text indexing Thesauri Ontology Term lists Faceted Classification Less More Taxonomies Analytico-synthetic Classification Tagging Enumerated Classification Increasing control over form, relationships and meaning…

Text Indexing ▪ Full-text and inverted files/indexes

Inverted files… Primary form of index developed for use in information systems for full-text retrieval It is called an “inverted file” because the normal rows (documents) and columns (words) of a database are inverted with rows representing words and columns representing documents.

Example inverted file… Main Data File ID HOUSE PRICE 1 1208 Twin Oaks Way $100,000 2 100 Sutton Heights $200,000 3 10 Pine Street $150,000 4 8539 Billings Circle 5 9537 Highway 101 North 6 10 Capitol Hill Avenue North For example, assume you have a database of houses (rows) and one field (column) for each of those houses is the price. If you want to do rapid search by house price, build an inverted file with the rows being the prices and the columns the houses. You look up the price once and harvest the row’s columns for the houses. Inverted File or Inverted Index $100,000 1 4 5 $150,000 3 6 $200,000 2

Inverted file (document level)… Text 1 Gold silver truck 2 Shipment of gold damaged in a fire 3 Delivery of silver arrived in a silver truck 4 Shipment of gold arrived in a truck Number Term Times; Documents 1 a <3; 2,3,4> 2 arrived <2; 3,4> 3 damaged <1; 2> 4 delivery <1; 3> 5 fire 6 Gold <3; 1,2,4> 7 of 8 in 9 shipment <2; 2,4> 10 silver <2; 1,3> 11 truck <3; 1,3,4>

Inverted file (term-level)… Document Text 1 Gold silver truck 2 Shipment of gold damaged in a fire 3 Delivery of silver arrived in a silver truck 4 Shipment of gold arrived in a truck Number Term Times; Documents Words 1 a <3; (2;6),(3;6),(4;6)> 2 arrived <2; (3;4),(4;4)> 3 damaged <1; (2;4)> 4 delivery <1; (3;1)> 5 fire <1; (2;7)> 6 gold <3; (1;1),(2;3),(4;3)> 7 of <3; (2;2),(3;2),(4;2)> 8 in <3; (2;5),(3;5),(4;5)> 9 shipment <2; (2;1),(4;1)> 10 silver <2; (1;2),(3;3,7)> 11 Truck <3; (1;3),(3;8),(4;7)>> Proximity operator support

Inverted file (document level)… Text 1 Gold silver truck 2 Shipment of gold damaged in a fire 3 Delivery of silver arrived in a silver truck 4 Shipment of gold arrived in a truck Number Term Times; Documents 1 a <3; 2,3,4> 2 arrived <2; 3,4> 3 damaged <1; 2> 4 delivery <1; 3> 5 fire 6 Gold <3; 1,2,4> 7 of 8 in 9 shipment <2; 2,4> 10 silver <2; 1,3> 11 truck <3; 1,3,4> Stop words With very sophisticated full-text retrieval systems, the aggregate size of the inverted files necessary to support search can be larger than the text files they index.

Term Lists

Term lists… The simplest forms of controlled value spaces are term lists—lists of controlled terms ordered by some principle (frequently alphabetical) Infants Ankle biters Rug rats The list of authorized U.S. state abbreviations An alphabetic list of enumerated subject terms Infants (preferred term) Don’t underestimate the power of these simple, controlled lists

Simple (yet powerful) lists… A list (also sometimes called a pick list) is a limited set of terms arranged as a simple alphabetical list or in some other logically evident way. Lists are used to describe aspects of entities that have a limited number of possibilities. Examples include geography (e.g., country, state, city), language (e.g., English, French, Swedish), or format (e.g., text, image, sound) Simple alphabetical list: Alabama Alaska Arkansas California Connecticut Delaware Simple logical list: Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto*

Taxonomies ▪ Yahoo! Directory

Dominant form on the Web… Hierarchical tree structure Example: Yahoo! Directory Frequently permit polyhierarchy (multiple parents) No general principles guiding design of taxonomies “A collection of controlled vocabulary terms organized into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/narrower) relationships to other terms in the taxonomy.” [NISO/Z39.19] [emphasis added] This is not intended to imply that Web taxonomies are necessarily unprincipled. It just means that as a form, they do not have the same guiding principles as we will see with thesauri and classifications. Individual designs of taxonomies can be rigorously structured and with consistent, intelligent, well thought out forms of cross referencing and other devices.

Polyhierarchy

Polyhierarchy… [NISO/Z39.19] Based on generic relationship Based on whole-part relationship Based on multiple types of relationship musical instruments stringed instruments percussion instruments piano biology chemistry biochemistry bones head skull

Node Labels milk . . <milk by source animal> .. buffalo milk Non-indexable concepts used for purposes of organizing other concepts in meaningful ways milk . . <milk by source animal> .. buffalo milk .. cow milk .. goat milk .. sheep milk . <milk by region> .. United States .. India ..China

End • Part 3B: Text Indexing, Term Lists & Taxonomies