10/24/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

Metadata and Search at Boeing Julie Martin Library & Learning Center Services
Chapter 5: Introduction to Information Retrieval
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind © 2005 John Wiley and.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Information Retrieval in Practice
Search Engines and Information Retrieval
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
SLIDE 1IS Fall 2002 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am.
SLIDE 1IS 257 – Fall 2009 Organization of Information in Collections: Introduction University of California, Berkeley School of Information.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
10/23/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
10/26/2000Information Organization and Retrieval Metadata and Description University of California, Berkeley School of Information Management and Systems.
Thesaurus Design and Development
8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Overview of Search Engines
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
GFIS-Africa Editorial tutorial – prepared by Anne Handley February 2003 (modified by Eero Mikkola July 2004)Anne Handley Aims To teach the skills needed.
1 Open-source platform for accessible content management Museo & Web CMS.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
BEN METADATA SPECIFICATION Isovera Consulting Feb
+ Information Systems and Databases 2.2 Organisation.
Evidence from Metadata INST 734 Doug Oard Module 8.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Learning Objectives Understand the concepts of Information systems.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
FIND IT! USING LIBRARY CATALOGING CONCEPTS TO ORGANIZE AND MAKE RECORDS FINDABLE DIONNE L. MACK, INTERIM DIRECTOR OF QUALITY OF LIFE DEPARTMENTS.
Organization of Information LSIS Summer II (2005)
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information Retrieval in Practice
CS 430: Information Discovery
University of California, Berkeley
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Attributes and Values Describing Entities.
Introduction to Semantic Metadata & Semantic Web
Attributes and Values Describing Entities.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

10/24/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval

10/24/2000Information Organization and Retrieval Review The Course Information Hierarchy Volume of information and growth of the Internet

10/24/2000Information Organization and Retrieval Two Main Themes Information Organization and Design Information Retrieval and the Search Process

10/24/2000Information Organization and Retrieval Course Schedule Organization –Overview –Metadata and Markup –Controlled Vocabularies, Classification, Thesauri –Information Design Thesaurus Design Database Design Retrieval –The Search Process –Content Analysis Tokenization, Zipf’s Law, Lexical Associations –IR Implementation –Term weighting and document ranking Vector space model Probabilistic model –User Interfaces Overviews, query specification, providing context, relevance feedback

10/24/2000Information Organization and Retrieval Information Hierarchy Wisdom Knowledge Information Data

10/24/2000Information Organization and Retrieval Totals Stored Per Year Medium Type of content Terabytes/Year Terabytes/Year Upper Bound Lower Bound Paper Books 8 7 Newspapers Periodicals Office documents SUBTOTAL Film Photographs 410, ,000 Cinema X-Rays 12,000 12,000 SUBTOTAL 422, ,016 Optical Music CDs Data CDs 3 3 DVDs SUBTOTAL Magnetic Camcorder 300, ,000 Disk drives 2,555,000 1,000,20 SUBTOTAL 2,855,000 1,300,200 TOTAL 3,277,440 1,412,632

10/24/2000Information Organization and Retrieval Projected Voice and Data Traffic Gb/s Source: America's Network, May 15, 1998

10/24/2000Information Organization and Retrieval Internet Hosts (000s) Source: Vint Cerf

10/24/2000Information Organization and Retrieval Information Overload “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden)

10/24/2000Information Organization and Retrieval Today Organization of Information Information Life Cycle (review) Introduction to structured information (SGML/XML) Metadata and the Dublin Core

10/24/2000Information Organization and Retrieval Organization of Information Is there a basic human need to put things into some sort of order? –Much of natural language concerns categories of things rather than individual things (more on this next week) –Why do we organize things and information? Why do spoons go in THAT drawer in the kitchen and not in a can in the garage? Why do your favorite books go on one shelf and not-so-favorite on another?

10/24/2000Information Organization and Retrieval Why Organize Information? The main reason –So that you can find things more effectively I.e., Effective retrieval is predicated on some sort of organization applied to information resources Historically there have been many institutions and tools devoted to information organization –Libraries –Museums –Archives –Indexes and catalogs, dictionaries, Phone books, etc.

10/24/2000Information Organization and Retrieval To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary

10/24/2000Information Organization and Retrieval What is Information Organization? Identifying the existence of all types of information-bearing entities as they are made available Identifying the works contained within those information-bearing entities or as parts of them Systematically pulling together these information- bearing entities into collections in libraries, archives, museums, Internet communications files and other such depositories. From Taylor, Chap. 1

10/24/2000Information Organization and Retrieval What is Information Organization? Producing lists of these information-bearing entities prepared according to standard rules for citation Providing name, title, subject and other useful access to these information-bearing entities Providing the means of locating each information-bearing entity or a copy of it

10/24/2000Information Organization and Retrieval Organizating Information Libraries Archives Museums and Galleries Internet Corporate and Office environments

10/24/2000Information Organization and Retrieval Information Life Cycle Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

10/24/2000Information Organization and Retrieval Authoring/Modifying Converting Data+Information+Knowledge to New Information. Creating information from observation, thought. Editing and Publication. Gatekeeping

10/24/2000Information Organization and Retrieval Organizing/Indexing Collecting and Integrating information. Affects Data, Information and Metadata. “Metadata” Describes data and information. –More on this later. Organizing Information. –Types of organization? Indexing

10/24/2000Information Organization and Retrieval Storing/Retrieving Information Storage –How and Where is Information stored? Retrieving Information. –How is information recovered from storage –How to find needed information –Linked with Accessing/Filtering stage

10/24/2000Information Organization and Retrieval Distribution/Networking Transmission of information –How is information transmitted? Networks vs Broadcast.

10/24/2000Information Organization and Retrieval Accessing/Filtering Using the organization created in the O/I stage to: –Select desired (or relevant) information –Locate that information –Retrieve the information from its storage location (often via a network)

10/24/2000Information Organization and Retrieval Using/Creating Using Information. Transformation of Information to Knowledge. Knowledge to New Data and New Information.

10/24/2000Information Organization and Retrieval Information Life Cycle Scenarios Information Life Cycle in the Arts Information Life Cycle of Business Records Information Life Cycle in Health Information Systems

10/24/2000Information Organization and Retrieval Key issues in this course How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them. –Organizing How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs. –Retrieving

10/24/2000Information Organization and Retrieval Key Issues Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

10/24/2000Information Organization and Retrieval Structure of an IR System Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System

10/24/2000Information Organization and Retrieval Metadata Metadata is: – “data about data” (term usage database systems) –Information about Information –Structures and Languages for the Description of Information Resources and their elements (components or features) –“Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

10/24/2000Information Organization and Retrieval Types of Metadata Element names. Element description. Element representation. Element coding. Element semantics. Element classification.

10/24/2000Information Organization and Retrieval How can you describe an information-bearing object?

10/24/2000Information Organization and Retrieval Dublin Core Simple metadata for describing internet resources. For “Document-Like Objects” 15 Elements.

10/24/2000Information Organization and Retrieval Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

10/24/2000Information Organization and Retrieval Title Label: TITLE The name given to the resource by the CREATOR or PUBLISHER.

10/24/2000Information Organization and Retrieval Author or Creator Label: CREATOR The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

10/24/2000Information Organization and Retrieval Subject and Keywords Label: SUBJECT The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as MEdical Subject Headings or Art and Architecture Thesaurus descriptors) as well.

10/24/2000Information Organization and Retrieval Description Label: DESCRIPTION A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.

10/24/2000Information Organization and Retrieval Publisher Label: PUBLISHER The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.

10/24/2000Information Organization and Retrieval Other Contributors Label: CONTRIBUTORS Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, illustrators, and convenors).

10/24/2000Information Organization and Retrieval Date Label: DATE The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X In this scheme, the date element for the day this is written would be , or December 3, Many other schema are possible, but if used, they should be identified in an unambiguous manner.

10/24/2000Information Organization and Retrieval Resource Type Label: TYPE The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. A preliminary set of such types can be found at the following URL:

10/24/2000Information Organization and Retrieval Format Label: FORMAT The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non-electronic media.

10/24/2000Information Organization and Retrieval Resource Identifier Label: IDENTIFIER String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.

10/24/2000Information Organization and Retrieval Source Label: SOURCE The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.

10/24/2000Information Organization and Retrieval Language Label: LANGUAGE Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z39.53 three character codes for written languages. See:

10/24/2000Information Organization and Retrieval Relation Label: RELATION Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

10/24/2000Information Organization and Retrieval Coverage Label: COVERAGE The spatial locations and temporal duration characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

10/24/2000Information Organization and Retrieval Rights Management Label: RIGHTS The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.