Presentation is loading. Please wait.

Presentation is loading. Please wait.

MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for.

Similar presentations


Presentation on theme: "MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for."— Presentation transcript:

1 MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603 South Central Unicorn Users Group Annual Conference, October 17, 2003 Austin, Texas

2 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 2 Overview Context for the analysis -- interoperability Findings from the analysis Indexing and MARC Discussion

3 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 3 Context for the analysis Interoperability across library online catalogs Indexing of MARC records to support searching Richness of MARC content designation available Indexing guidelines prepared for the Z39.50 Interoperability Testbed (Z-Interop) Implications for indexing guidelines and policies

4 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 4 Interoperability Systems and organizations will interoperate! One should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise opportunities for exchange and re-use of information, whether internally or externally. Paul Miller, 2000

5 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 5 Factors affecting interoperability Multiple and disparate systems operating systems, information retrieval systems, etc. Multiple protocols Z39.50, HTTP, SOAP, etc. Multiple data formats, syntax, metadata schemes MARC 21, UNIMARC, XML, ISBD/AACR2-based, Dublin Core Multiple vocabularies, ontologies, disciplines LCSH, MESH, AAT Multiple languages and character sets Indexing, word normalization, and word extraction policies

6 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 6 Information communities Community agreements exist (e.g., standards, rules, etc.) Interoperability factors reduced Interoperability more easily achieved Do we need additional agreements regarding indexing policies to improve interoperability? Libraries as Focal Community  Relative homogeneity of data and systems  Standards-based MARC records  Content and structure prescribed by AACR  Commonly understood access points  Use of controlled vocabularies

7 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 7 Interoperability testbed project Realizing the Vision of Networked Access to Library Resources: An Applied Research and Demonstration Project to Establish and Operate a Z39.50 Interoperability Testbed A Institute of Museum and Library Services National Leadership Grant Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE… http://www.unt.edu/zinterop/

8 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 8 Threats to Z39.50 interoperability Differences in implementation of the standard Differences in local information retrieval systems Search functionality Indexing policies These threats can be addressed by Z39.50 specifications and configuration (i.e., profiles) Enhancing local information retrieval systems Recommendations for local indexing decisions

9 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 9 Components of the testbed Test dataset 400,000+ MARC 21 records from OCLC’s WorldCat Z39.50 reference implementations Z-client (Bookwhere), Z-server & information retrieval system (Sirsi Unicorn) Test scenarios & searches Searches with known result records from dataset Benchmarks Results of test searches using reference implementations

10 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 10 MARC Record structure for encoding data for machine processing Standard structure (ANSI/NISO Z39.2/ISO 2709) Leader Directory map 3-digit tag to identify a field 2 indicator values to provide additional processing information 1 or more delimiters/codes to identify subfields Content designation: Semantics MARC 21 245 00 $a [title] $h [format] : $b [subtitle] Rules Anglo-American Cataloguing Rules and others

11 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 11 MARC 21 content designation MARC 21 Field Groups Currently Defined ObsoleteTotalMARC 1972 (Books Format Only) 00x6173 0xx238724528 1xx6616740 2xx1373216915 3xx109321414 4xx690 37 5xx323383618 6xx184518966 7xx4524749941 8xx1412016136 TOTAL17251831908278

12 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 12 Z-Interop test dataset Books: 91% Cartographic Materials: < 1% Electronic resources: < 1% Archival/Mixed Materials: <1% Sound recordings: 4% Visual Materials: 1% Serials: 3% Approximately 1% sample of MARC records from OCLC’s WorldCat database Weighted sampling based on number of libraries “holding” the object represented by the record 419,657 total MARC records 89% of records “full level” cataloging Formats represented in test dataset

13 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 13 MARC record LDR01019cam 2200265 4500^ 001 ocm00000003^ 003 OCoLC^ 005 20010925133908.0^ 008 690414s1963 nyu b 000 0 eng ^ 010 $a63064323 ^ 040 $aDLC $cDLC ^ 050 04 $aHV700.5 $b.N37 ^ 082 0 $a362.7/3 ^ 110 2 $aNational Study Service. ^ 245 10 $aIllegitimacy and adoption in Maine : $breport of a study made for the Maine Committee on Children and Youth. ^ 260 $a[New York], $c1963. ^ 300 $a24 p. ; $c28 cm. ^ 500 $aCover title. ^ 504 $aBibliographical footnotes. ^ 650 0 $aIllegitimacy $zMaine. ^ 650 0 $aAdoption $zMaine. ^ 710 1 $aMaine. $bCommittee on Children and Youth. ^

14 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 14 Decomposing MARC Records OCLC # Tag1 st Ind 2 nd Ind SubFldFld Pos SubFld Pos Word Pos Word 31111 Ocm00000003 33211 OCoLC 31102a1111 National 31102a1112 Study 31102a1113 Service 324510a1211 Illegitimacy 324510a1212 and 324510a1213 Adoption 324510b1221 Report 36500a1711 Illegitimacy 36500z1721 Maine 400,000 MARC21 records = 33 million decomposed records

15 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 15 Content designation in dataset MARC 21 Field Groups Currently Defined ObsoleteUnlikely Used Total 00x6006 0xx96133130 1xx490251 2xx81019100 3xx236029 4xx1003040 5xx12813132 6xx10417112 7xx20505210 8xx10538116 TOTAL80712107926

16 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 16 Summary frequency results Frequency# of Fields/Subfields% of All Occurrences > 600,00014.4% 500,000 > 599,99900% 400,000 > 499,9991339.9% 300,000 > 399,999614.3% 200,000 > 299,999610.6% 100,000 > 199,9991010.3% TOTAL3679.5% Total number of fields/subfields occurring in dataset = 13,849,499 Only 4% of all fields/subfields account for 80% of all occurrences or 96% of all fields/subfields account for 20% of all occurrences

17 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 17 Characteristics of top 36 Most frequently occurring: 650 $a [Subject data] 2 nd most frequently occurring: 040 $d [Cataloging source] 3 rd & 4 th most frequently occurring: 260 $a & $b [Publication information] 5 th most frequently occurring: 245 $a [Title] Contain data useful to end users: 28 Contain control numbers, etc.: 5 Contain data useful to catalogers: 3

18 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 18 Indexing & MARC Indexing Guidelines to Support Z39.50 Profile Searches Indexing Guidelines to Support Z39.50 Profile Searches Identified all MARC 21 fields/subfields that may contain author, title, or subject data Author-related fields/subfields : 119 AuthorTitle-related fields/subfields: 21 Title-related fields/subfields: 253 Subject-related fields/subfields: 144 537 fields/subfields contain author, title, subject data Usefulness of indexing all possible fields?

19 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 19 Occurrences in test dataset 381 occur one or more times in Z-Interop dataset Author, title, or subject fields/subfields in Z-Interop dataset Author-related fields/subfields : 86 AuthorTitle-related fields/subfields: 16 Title-related fields/subfields: 178 Subject-related fields/subfields: 101 19 of the 381 (5%) account for 80% of all occurrences 9 of 19 are subject-related 5 of 19 are author-related 5 of 19 are title-related The 19 fields/subfields

20 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 20 Implications for indexing What difference does indexing decisions make? Preliminary testing using the 19 fields/subfields: 95% - 100% of correct records retrieved! How much time would be saved in setting up indexing policies? Is there a systematic method to identify the “best” fields/subfields to index? Per format of materials? Per user (librarians and end users) needs? Good enough search results?

21 Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003 21 References Z39.50 Interoperability Testbed  http://www.unt.edu/zinterop/ http://www.unt.edu/zinterop/ Indexing Guidelines to Support Z39.50 Profile Searches  http://www.unt.edu/zinterop/Documents/IndexingGui delines1Feb2002.pdf http://www.unt.edu/zinterop/Documents/IndexingGui delines1Feb2002.pdf


Download ppt "MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for."

Similar presentations


Ads by Google