Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Organization

Similar presentations


Presentation on theme: "Information Organization"— Presentation transcript:

1 Information Organization
LBSC 670 Information Organization Welcome, anything interesting? How did we feel about html?

2 Thoughts from last class
“I feel like we are getting behind” “Why are we learning HTML/CSS?” “What is cloud computing?” “Can we have printouts again?” Glad to see enthusiasm for moving forward No printouts tonight – if you want the printouts to return indicate in the feedback

3 LITA National Forum http://www.ala.org/ A short report on LITA
What is it, what is ALA, how do you get involved?

4 LITA - Free your metadata
(Amalia) 1. Amalia passed this video along – touches on some ideas we will see soon but thought that it was fun to see

5 LITA – Digital Institutes
Data and digtial institutes are big. What do we know about digital humanities, about e-science These fields are very concerned with metadata and inforamtion roganziation issues

6 Data Services Librarian
1. New jobs being created – not reference, not catalogers, not systems – but a combination of all three Advocate for data publishing, research, curation, collaboration

7 Class Plan Explore historical foundation of cataloging
Identify metadata standards central to cataloging Explore metadata schemas useful for libraries Tonight – metadat and cataloging A bit on XML

8 Review HTML implements a metadata schema (e.g h1, h2, DOM. . .) and an encoding system (e.g XHTML) in concert with supporting technologies (e.g. CSS, JavaScript) Digital documents have embedded structure that programs use to encode and decode information for use Two main concepts that we have focused on for the last few weeks are on the slide This week we are diving down into the idea of a metadata schema to explore how it is used to represent all sorts of digital documents We will do this by exploring the process of cataloging but keep in mid that metadata schemas and encoding systems are applicable to basically any digital document (html, xml, word, databases, digital media files)

9 Storage Relational database Object databases Flat text files XML files
Tables, SQL, indexes, abstracted but semi-fixed structure Object databases Storage of objects which are directly accessible via programs Flat text files embedded structure, tight association with application, quick, simple XML files Abstracted structure, portable, extensible, slower? Embedded in digital objects Portable, associative 1. A brief note on storage – there are other ways to organize information – this semester we will look at a few of these encoding models.

10 Representation of objects
Representation Record Encoding model Review – last week we dealt with metadata on ‘first class’ objecdts This week we are concerned with the idea of representation Object entity, information object, document, resource Record representation, resource description, entity description, bibliographic record Element = Attribute + Value Data storage model Relational database, text file, xml data

11 Metadata definitions Common Gillian-Swetland, Baca Greenberg
Data about Data Data that describes a resource Information about Information Gillian-Swetland, Baca "the sum total of what one can say about any information object at any level of aggregation.“ Content, Context, Structure Greenberg Structured data about an information object that facilitates functions associated with the designated object We learned last week that objects have encoding mechanisms and that metadata is a natual part of that – can anyone remember a metadata element that we saw in our HTML documents? Do we recall any uses of metadata?

12 Metadata Life-cycle Codification Storage Use/Reuse Scalability
Just like most processes you can type the things we do with metadata over time. We create metadata, organize it, use it, preserve it, dispose of it. Metadata helps us do these things with digital resources too In fact – there is a model – the Reference model for an Open Archival Information System (OAIS) that talks about the different functions of metadata Certifies description of resource Codifies context, structural relationships Stores administrative, technical, preservation data Enables data interoperability (structural, semantic) Enables automated harvesting, re-use Extends scalability, portability of information systems Gilliand, 2007

13 Consider the chart in gillian – traditional cataloging is only a part of this
Our concern this semester will naturally fall more on descriptive standadrs There are other types of metadata Structural – how are objects related (METS, XRDF) Event – What happens, when and how Curation – how has the item been valued,held Important to consider the correct model and schema Tonight we are going to look at two different schemas Gilliand, 2007

14 Standards Types Data structure standards Data communication standards
Standards that govern the scope and purpose of a metadata record (MARC, Dublin Core, Text Encoding Initiative (TEI)) Data communication standards Encoding (e.g., HTML/XHML, XML) Data syntax standards Element ordering, content syntax, and encoding syntax (e.g. date/time syntax)

15 Cataloging purposes “A list of books, maps, and other items arranged in some definite order” (cutter) Discovery Catalogs, indexes, databases Management Technical and administrative metadata Access User interfaces (OPACS)

16 Dewey’s rules for cataloging
Based on: Panizzi’s 91 rules for cataloging British museum Charles coffin Jewett Smithsonian librarian 1852 Standard cataloging, printed entries on cards Dewey Alphabetic ordering by subject System of all knowledge Classification – browse

17 A quick history of classification
245 BCE – Callimachus creates Pinakes 120 volume catalog for 400 scrolls Title, author, teachers, biography Six genres (rhetoric, law, epic, tragedy, lyric poetry, history, medicine, mathematics, natural science, misc) 48 BC – Alexandria burns . . .Then for a long time nothing happened  . . 1876 Dewey Decimal System 1882 – Charles Cutter – Cutter classification 1897 – Herbert Putnam – LC Classification Callimachus 400,000 scrolls 3rd century BCE, Alexandria Author type (poet, lawmaker) Era Format Subject Renaissance >> information proliferation Internet >> information explosion Images from Wikipedia 

18 Book Metadata (circa 1960)

19 Book Metadata (circa 1980) 100 2_ |a Berners-Lee, Tim.
|a Weaving the Web : |b the original design and ultimate destiny of the World Wide Web by its inventor / |c Tim Berners-Lee with Mark Fischetti. 250 __ |a 1st ed. 260 __ |a San Francisco : |b Harper SanFrancisco, |c c1999. 300 __ |a xi, 226 p. ; |c 25 cm. 500 __ |a Includes index. 650 _0 |a World Wide Web |x History. |a Berners-Lee, Tim. 700 1_ |a Fischetti, Mark. |3 Publisher description |u

20 Book Metadata (circa 2002)

21 Library uses of metadata
Descriptive cataloging Inventory of holdings Technical and administrative metadata about acquisitions Interoperability with other systems Facilitating acquisition decisions Federate searches from other catalogs

22 Cataloging process Spend a moment here This works in other domains too
Level 1 Describe the resource Level 2 Select access points (title, author, ISBN) Level 3 Select headings (subject, author) Level 4 Establish references (external links, record links)

23 An example MARC record 2. Access points 1. Description 3. Headings
Level 1 Describe the resource Level 2 Select access points (title, author, ISBN) Level 3 Select headings (subject, author) Level 4 Establish references (external links, record links) 3. Headings 4. References

24 Anatomy of a bibliographic record
Content/syntax Standards (AACR / RDA) Classification Systems (LCSH / DDC) Encoding Standards (MARC / MARCXML)

25 AACR2 processes Area 1: title, statement of responsibility
Area 2: edition Area 3: material type Area 4: publication, distribution Area 5: physical description Area 6: series Area 7: notes Area 8: standard number, terms

26 How to enter a title into a MARC record
AACR2 Transcribe title exactly according to spelling but not necessarily punctuation/capitalization. If an alternative title is present, precede it by a comma following the regular title Use a General Material Designation in brackets [] MARC Standard Use 245 field – indicates Main title Indicator 2 – Number of non-filing characters (leading articles) Subfield a – main title Subfield b – remainder of title Subfield h – General Material Designation in brackets [] Write this up on the board to show encoding stuff

27 Dublin Core Overview Created out of a 1995 meeting in Dublin Ohio
An intentionally simple standard focused on resource description DCMI conference (2007) Enjoys widespread adoption in Library and Digital library community, particuarly as a lowest-common-denominator standard Present on DC

28 Initial Dublin Core Focused on Digital Document-like-objects
Simple description, human based Focus on descriptive metadata over technical, preservation, use metadata “The Metadata Workshop participants made no attempt to limit the complexity of DLOs, except to say that the intellectual content of a DLO is primarily text, and that the metadata required for describing DLOs will bear a strong resemblance to the metadata that describes traditional printed texts. “ Weibel 1995 – worth brinign up. Descriptive metadata for resource discovery All elements are optional and repeatable Constraints established at implementation level (not by the semantic specification) Extensible: A starting place for richer descriptions Interdisciplinary (semantic interoperability) International consensus

29 Dublin Core ( ) Subject: The topic addressed by the work Title: The name of the object Author: The person(s) primarily responsible for the intellectual content of the object Publisher: The agent or agency responsible for making the object available OtherAgent: The person(s), such as editors and transcribers, who have made other significant intellectual contributions to the work Date: The date of publication ObjectType: The genre of the object, such as novel, poem, or dictionary Form: The physical manifestation of the object, such as Postscript file or Windows executable file Identifier: String or number used to uniquely identify the object Relation: Relationship to other objects Source: Objects, either print or electronic, from which this object is derived, if applicable Language: Language of the intellectual content Coverage: The spatial locations and temporal durations characteristic of the object Weibel, 1995

30 Dublin Core (1.1 - 1999) Title Author or Creator Subject and Keywords
Description Publisher Other Contributor Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

31 Qualified Dublin Core (Current)
71 properties, 35 classes (Registry) Expansion of scope/purpose Multiple encoding models (HTML/XHTML, XML, RDF) Addition of Application Profile concept Term: The generic name for a property (i.e. element or element refinement), vocabulary encoding scheme, syntax encoding scheme or concept taken from a controlled vocabulary (concept space). • A property is a specific aspect, characteristic, attribute, or relation used to describe resources. • Within DCMI, element is typically used as a synonym for property. • A vocabulary encoding scheme is a class that indicates that the value of a property is taken from a controlled vocabulary (or concept-space), such as the Library of Congress Subject Headings.

32 A possible record Title: New Web language promises smarter surfing
Subject: World Wide Web Subject: Extensible Markup Language Subject: World Wide Web Consortium Subject: Standards, Web Creator: Heid, Jim Creator: Glenn McDonald Created: 01/07/1998 Identifier: Publisher: Cable News Network Language: en Description: This article discusses the recent adoption of XML by the W3C as a standard and its possible uses in a web environment Format: text/html Rights: All Rights Reserved

33 Dublin Core Abstract Model

34 HTML Encoding of DC

35 Example Title: Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web Author: Tim Berners-Lee Subject: World Wide Web Publisher: Collins Date: 2000 Language: English ISBN-13:

36 Work time Complete pages 1-4 of the worksheet What is Dublin Core
Creating a Dublin Core record

37 Issues in cataloging focus of 'by-value' cataloging instead of by-reference means that consistency is poor focus on text identifiers (title, author) over unique IDs means record duplication is rampant focus on traditional descriptive measures limits effectiveness in new discovery systems that do not respect complex metadata focus on print resources has made cataloging for internet resources difficult

38 New concepts in cataloging
RDA: Resource description and analysis FRBR: Functional requirements for bibliographic records FRAD: Functional requirements for authority data FRSD: Functional requirements for subject data

39 Resource Description and Analysis
RDA is an update to the AACR2 RDA uses a new data model (FRBR) RDA includes new MARC fields RDA is not yet implemented

40 Addresses user tasks ICP’s highest principle = “convenience of
FRBR: Find Identify Select Obtain FRAD: Find Identify Contextualize Justify RDA addresses these user tasks identified in the FRBR and FRAD models. Responding to those user needs, coincides with the main principle stated in the Statement of International Cataloguing Principles: convenience of the user. ICP’s highest principle = “convenience of the user” Slide from

41 FRBR’s Entity-Relationship Model
Entities Relationships Attributes (data elements) National level required elements relationship The FRBR model describes all of the entities and relationships in the bibliographic universe and the identifying characteristics of all these things – FRBR calls these “attributes” and RDA calls them “Elements”. So when we talk about the “element set” in RDA, we are talking about the various characteristics we use to identify the things in our universe – from the resources we are adding to our collections to the people and corporate bodies and families associated with those things. We can diagram the FRBR model using <click> boxes for the entities that are <click> connected by arrows to show the relationships <click> with other entities. So in addition to the user tasks and entities and relationships, FRBR also specifies basic data needed for national level records – a basic set of elements – which also translates in RDA to the “Core” elements. One Entity Another Entity Slide from 41 41 41

42 FRBR’s Entity-Relationship Model
Person Work created was created by Here’s an example of how the entities and relationships in FRBR work: we can say one entity, <click> a person <click> named Shakespeare (an identifying characteristic/attribute) is the <click> creator of (relationship) <click> the play Hamlet (another entity) – or we can say the relationship goes both directions – Shakespeare created Hamlet and also the other way, Hamlet <click> was created by Shakespeare. Actually in the conceptual model we’d move this to a more abstract level to say a person created a work and a work was created by a person – the entities are person and work and the relationship between them is the created/created by relationship. We use the model to help design systems so any individual can be plugged into the model when we apply the model in a specific implementation. So we have entities and relationships. The FRBR entities are sorted into 3 groups for the convenience of talking about them. Shakespeare Hamlet Slide from 42 42 42

43 Terminology FRBR and FRAD “attributes” are “elements” in RDA = identifying characteristics FRBR and FRAD Group 1 entities: Work Expression Manifestation Item Some of the terminology in the RDA instructions may be new to you or have different meanings. RDA uses the term “elements” for the identifying characteristics of “attributes” for each of the FRBR entities. Terms you’ll see throughout RDA are those for the Group 1 entities in the FRBR and FRAD conceptual models: Work, Expression, Manifestation, and Item. Slide from

44 FRBR Functional requirements for bibliographic records
group 1 - Entities -work, expression, manifestation, item group 2 - person or corporate bodies responsible for a work (FRAD) group 3 - subjects - concepts, events, places. . . (FRSD)

45 FRBR Model http://fictionfinder.oclc.org/ http//worldcat.org

46 FRBR background Work/item C.A. Cutter (1890)
Notion of a work S. R. Ranganathan (1930-late 1960) Intellectual entity – expressed thought Physical entity – embodies thought P. Wilson Intellectual entity – work Subject metadata Physical entity – item Selected descriptive metadata Adapted from Jane Greenberg

47 FRBR components Work Expression Manifestation Item
distinct intellectual or artistic creation Expression intellectual or artistic realization of a work Manifestation physical embodiment of an expression of a work Item a single exemplar of a manifestation Adapted from Jane Greenberg

48 FRBR Example Rolling Stones’ IT'S ONLY ROCK-N –ROLL (1974) (work)
Group’s performance recorded for the album (Expression) Recording released in 1974 by MCA Records on tape cassette (Manifestation) Recording released in 1974 by MCA Records on compact disc (Manifestation) Sheet music released in 1992 (?) Adapted from Jane Greenberg

49 FRBR diagram I: UNC Musllib.CD, RCA, 2005 c.3
I: Your CD, RCA, 2005 c.1 M: CD, RCA, 2005 I: My CD, RCA, 2005 c.2 E: Music (just the instruments) E: Music and lyrics Work, the Performance (1974) M: RS, LP 1974 M: 8-track, RCA, 1975 Adapted from Jane Greenberg

50 FRBR Algorithm (1) Process Extract Author Extract Title
Construct Authority author entry from100, 400 using subfields and 008 data to limit Extract Title Construct Authority title entry from 130, 240, 245, etc. Normalize using NACO Combine these two authorities to create a unique Work identifier <author>Mitchell, Margaret</author><title>Gone with the wind</title>

51 FRBR Algorithm (2) Results from a sample extraction (From FRBR doc)
<author>/<title> (75.97%) <uniform title> (1.34 %) /<title>/[one or more <name>] (17.35%) /<title>/<control number> (5.34%)

52 Worktime Complete pages 5 & 6 Mapping DC to MARC

53 Metadata tools Tool Type Uses Conversion / Crosswalk
Migrate data from one form to another Creation Automatic or semi-automatic creation of metadata Extraction / Harvesting Pull metadata from digital objects or systems for use/re-use Evaluation Validate schema or encoding of metadata records Searching Facilitate discovery and use of metadata As a group, look up a quick definition for each tool and discuss. You can use the definitions at ( use google, or look at the tool examples. Tool Type Definition Metadata Creation Metadata Extraction Metadata Generation Metadata Harvesting Metadata Quality Search Engines Translation Transliteration Validation Crosswalks

54 Evaluation Metadata evaluation methods Greenberg Review (2002)
Toezer (1999) Accuracy, completeness, consistency, timeliness, and intelligibility Rothenberg (1996) Correctness, appropriateness Zeng (1993) Specificity, exhaustivity, record completeness

55 Evaluating Representation
Completeness, specificity, exhaustivity Did the record capture essential elements of the object? Does the encoded record differentiate appropriately between elements? Document/Index surrogation, retrieval Is this a surrogate/abstraction and not a codification of the resource? Is the level of surrogation/abstraction appropriate for storage/retrieval/use goals?

56 Evaluating Representation
Accuracy, consistency Are the details of abstraction correct? Is the content represented/encoded accurately? Utility, effectiveness, timeliness Is the representation appropriate for a given audience and use? Does the representation solve an information need?

57 Worktime Complete pages 7-10 – Metadata tools and evaluation

58 Next Week Online Encoding systems Assignment 1 questions
Read, complete worksheet, iscuss Encoding systems XML overview More on MARC encoding Assignment 1 questions


Download ppt "Information Organization"

Similar presentations


Ads by Google