Presentation is loading. Please wait.

Presentation is loading. Please wait.

Brian Lavoie Research Scientist OCLC Mining for Copyright Evidence ASIS&T 2008 Columbus, OH October 28, 2008.

Similar presentations

Presentation on theme: "Brian Lavoie Research Scientist OCLC Mining for Copyright Evidence ASIS&T 2008 Columbus, OH October 28, 2008."— Presentation transcript:

1 Brian Lavoie Research Scientist OCLC Mining for Copyright Evidence ASIS&T 2008 Columbus, OH October 28, 2008

2 Roadmap Copyright investigations OCLC Copyright Evidence Registry WorldCat as a source of copyright evidence Mapping across multiple data sources

3 Copyright investigation IPR issues amplified in digital environments Rights management metadata: INDECS, ORDL, ONIX, PREMIS, … Section 108 Study Group LC/JISC study: copyright law & digital preservation Copyright investigation increasingly important Mass digitization, Web harvesting, preservation, … RLG Programs report (March 2008) More common, but yet to converge on standardized workflow Ambiguity over sources of copyright evidence, procedural due diligence, benchmarks for decision-making Need data and tools to reduce cost and improve reliability of copyright investigations

4 OCLC Copyright Evidence Registry

5 CER essentials Collaborative environment for discovering and sharing information about copyright status of books Search WorldCat and other data sources for copyright evidence Record results of copyright investigations and share with others Rules engine: implement your own rules for assessing copyright status as automated process operating on information in CER Currently in pilot phase OCLC Research provided support during pilot development looking at: WorldCat as source of copyright evidence Mapping WorldCat data to other data sources

6 00819cam 2200253Ka 4500 001 ocn180754687 003 OCoLC 005 20080625054239.0 008 071103s2008 nyua 000 0 eng d 040 $a BTCTA $c BTCTA $d BAKER $d JBL 020 $a 9780399534294 020 $a 0399534296 092 0 $a 636.7532 $2 22 100 1 $a Foster, Stephen, $d 1962- 245 10 $a Walking Ollie, or, Winning the love of a difficult dog / $c Stephen Foster. 246 30 $a Winning the love of a difficult dog 250 $a 1st American ed. 260 $a New York : $b Penguin Group, $c 2008. 300 $a 177 p. : $b ill. ; $c 21 cm. 500 $a "A Perigee book" 650 0 $a Lurcher $v Anecdotes. WorldCat as a source of copyright evidence

7 WorldCat as a copyright evidence data source WorldCat very good source for detailed author/title information Author/creator name(s): main entry (1xx), added entries (7xx) Title information: Title statement (245), uniform title (240) Publication data also extensive, but sometimes a bit spotty …

8 Frequency of occurrence of key MARC data points in WorldCat records (books only)

9 Copyright evidence from multiple sources

10 Example: WorldCat and the Stanford Copyright Renewal Database Copyright renewal important for items published between 1923 and 1963 Renewal required to extend copyright protection Renewals after 1977: available in online database Renewals before 1977: print form only Stanford Copyright Renewal Database Converted pre-1977 renewal information to machine- readable form; manually searchable in online database Books only Automate matching between Stanford records and WorldCat?

11 Automated matching Copy of Stanford database: 246,300 records Copy of WorldCat (January 2008): 96,185,960 records Cross-record field correspondence: StanfordWorldCat TITL245 $a AUTH100 $a, first instance of 700 $a Constructed strings of normalized title/author key combinations; looked for matches across data sources

12 Results 430,070 matching pairs of Stanford/WorldCat records Multiple WorldCat matches to some Stanford records Implies … 81,663 unique Stanford records matched to WorldCat (about 33 percent of Stanford database) Interpret as lower bound on number of potential matches Some QA … Sample of matches checked manually to verify validity Excellent results! Sample of Stanford records with no WC match; checked manually to try to find match Results mixed Differences in formatting/parsing/division of data between renewal records and WorldCat

13 Matching precision 31 percent of matches were one-to-one 78 percent of matching clusters had 5 or fewer WorldCat records

14 Example 00612nam 2200205I 4500 001 ocm04682408 003 OCoLC 005 20010627101256.0 008 790222s1952 mau 000 0 eng 010 $a 53006423 040 $a DLC $c CLE 050 $a BX9842 $b.C27 092 $a 288 $b 205 100 1 $a Carnes, Paul Nathaniel, $d 1921- 245 10 $a For freedom and belief; $b a manual for Unitarians. 260 $a Boston, $b Beacon Press $c [1952] 300 $a 71 p. $b illus. $c 20 cm. 490 0 $a Beacon references series 650 0 $a Unitarian Universalist churches $x Doctrinal and controversial works. ID: RE066267 DATE: 1980 TITL: For freedom and belief. AUTH: Paul Nathaniel Carnes. OREG: A76385 DREG: 11Sep80 ODAT: 18Dec52 CLNA: Freda Carnes (W) OCLS: A WorldCat record Stanford record

15 Perspective Assessment of copyright status depends on available body of copyright evidence Often, thorough assessment will require synthesizing evidence from multiple sources e.g., WorldCat and Stanford databases Cost and effort of accumulating copyright evidence lowered when links between data sources can be established through automated techniques Can apply many familiar data processing techniques for this purpose Parsing/extracting data within records Linking records across data sources

16 Conclusion Traditionally, WorldCat data supports cataloging, resource discovery, resource sharing But WorldCat data can be repurposed to support range of library decision-making needs (e.g., copyright investigation) Decision-making is increasingly data-driven What would an evidence base look like in various library decision-making contexts? What questions need to be asked of the data? Can they be generalized & automated? Data-mining task is two-fold: Identify/expose right WorldCat data to support need in question Combine WorldCat data with relevant data from other sources Create value and lower cost

Download ppt "Brian Lavoie Research Scientist OCLC Mining for Copyright Evidence ASIS&T 2008 Columbus, OH October 28, 2008."

Similar presentations

Ads by Google