Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting.

Similar presentations


Presentation on theme: "Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting."— Presentation transcript:

1 Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting

2 Posner Project 2002 – 2004 Digitize the collection of Henry Posner Sr. –642 titles (1106 volumes) + 800 archival folders Acquire copyright permission –27% of volumes & small % of archival documents Create web site Provide unrestricted access Funded by Henry & Helen Posner Jr.

3 Preparation Evaluated & purchased Zeutschel scanner Interviewed & hired scanner operator Organized working group Designed workflow

4 Posner Working Group 2001–2002 Erika Linke Project Director 2002–2004 Denise Troll Covey Project Director –Linda Dujmic, Carole George, Bella Gerlich, Terry Hurlbert, Mary Kay Johnsen, Chris Kellen, Ann Marie Mesco, Joe Mesco, Gabrielle Michalek, Angel Morris, Jon Singletary, Brian Woolstrum

5 Scanning the Books 313,413 pages – all pages that can be scanned –600 DPI full color TIFF images –Most books scanned on Zeutschel –24 books scanned on loaner Indus –20 books not scanned –35 books partially scanned 200 gigabytes total storage –Scanned 40 gigabytes every 3–10 days

6 55 Books Not Scanned or Not Entirely Reasons –Bound too tight, pages uncut, poor condition, too large, too small, vellum Plans (20 surrogates located) –Locate & link to electronic versions at other sites –Digitize same title, different edition in our collection

7 Scanning the Archival documents 1,714 pages – all docs that are not confidential –Correspondence, dealer catalogs, pamphlets, advertisements, newspaper clippings, & newsletters Gloriana will verify that nothing confidential was scanned

8 Design, Implement & Maintain http://posner.library.cmu.edu Search & retrieval system (DIVA v2) Usage statistics User Interface design & functionality were informed by user studies

9 Search Currently search via metadata Full text search coming soon

10 Browse

11 Results Click to display book

12 Display – Full Text Available Click to display full text

13 Navigate

14 Display – Annotation Available Click to display annotation

15 User Interface Messages Blank pages Books not scanned in entirety No permission to display Alternative copy

16 Posner Collection Usage Books: 633 titles (1004 volumes) online Archival documents not yet online

17 Workflow Plan 2003 Scan page images (create TIFFs) Post–process (Wolfpack) –Fix – straighten, remove speckles, etc. –Convert to JPEG for display –Convert to text for full text search (OCR) Backup to tape Ingest into system for search & retrieval Update DOI server & add PURL to library catalog

18 Problems No OCR in initial workflow –Not specified in proposal –No OCR software for color images Scanned faster than we could post process –Huge backlogs of books Scanned to servers, waiting for post–processing On backup tape, waiting to restore & OCR Identified OCR as bottleneck Changed workflow to ingest books prior to OCR

19 Many–to–One, One–to–Many Problem Books –1 title  many volumes –Many titles  1 volume Archival folders –1 folder  many titles –Many folders  1 title

20 2004 Quality Control Discovered project personnel had different versions of the “complete” list of Posner books Created canonical spreadsheet Checking status of each book –Scanned & fixed page images –Converted to JPEG –Converted to text (OCR) –Added PURL to library catalog

21 Copyright Permission Goal: permission to digitize & provide open access –302 copyrighted volumes (27% of Collection) –Archival catalogs & newsletters

22 Copyright Permissions Workflow Identify copyrighted volumes (27%) Identify & locate copyright holders Send letters requesting non–exclusive permission –To digitize & provide open access –Option to restrict access to Carnegie Mellon only Follow up with phone call or email Negotiate to closure Update statistics & database

23 Problems Seeking Permission Identifying & locating copyright holders Publishers –Lost or misplaced request letter –Don’t know what they published –Don’t know what rights they have –Afraid of open access & lost revenue Learning copyright laws –Abandoned foreign works

24 29% 71% 24% 22% 54% 32% 17% 15% 36% Permission granted Permission denied No response Not located Posner Study Success Rates Received permission for 48% of books 2% of publishers applied access restrictions

25 Analysis by Copyright Holder Type Scholarly associations University presses Commercial publishers Authors & estates Other Success rate based on responses

26 Transaction Costs $ 10,808FTE labor $ 379Phone calls $ 100Paper & postage $ 11,287TOTAL May 2003 – October 2003 Does not include legal fees, administrator time, or cost of Internet connectivity or database creation. $78 per book/volume 174 letters & 159 follow–up calls or email

27 Other Work Created student internships (funded by Posner) Assisted Posner Center opening May 2004 –DVD, “jade” reproduction, initial exhibit Moving the Collection August 2004 Ongoing –Posner Center & Collection security –Staffing the Posner Center –Selecting & supervising interns –Preparing exhibits

28 The Million Book Project Digitize & provide open access to a million books Vision, leadership, & research – Carnegie Mellon $$ Equipment & travel – NSF $$ Labor & research – India & China “Attempt to understand & solve the technical, economic, & social policy issues of providing online access to all creative works of the human race.” Raj Reddy

29 Rationale Democratize knowledge & empower citizenry – Address disparity in library size & accessibility Facilitate new knowledge – Combining old & new, east & west, technical & humanistic Enhance student learning & success of faculty research Preserve cultural treasures Address copyright absurdities

30 Support Digital Library Research Information distribution, management, & sustainability Security, copyright, & digital rights management Accuracy of optical character recognition (OCR) OCR of non–Romanic languages & scripts Automatic creation of structural metadata Automatic summarization Intelligent indexing Machine translation Storage formats Search engines

31 Overview 2001–2002 2001 –NSF funded collection & planning meeting –Meet with Chinese partners in U.S. 2002 –NSF funded equipment & travel –Meet with Chinese partners in China –Meet with Indian partners in U.S. –Pilot shipment of books to India Gloriana & Mark Kamlet

32 Overview 2003 – 2004 2003 –All partners meet in India –Meet with partner OCLC –Partner U.C. Merced Libraries fund copyright permission work 2004 –All partners meet in U.S. –Lesk work on identifying public domain books –Kahle v Ashcroft Meeting with Indian President Dr. A.P.J. Abdul Kalam

33 Collection & Planning Meeting November 2001 Collection of collections Copyright considerations –Books for College Libraries –Separate funding Avoid duplicate scanning Safe, affordable shipments Pilot shipment to India Indigenous Materials Public Domain In Copyright

34 2002 Purchased scanners Sent scanners to India & China Trained scanner operators & librarians Began scanning indigenous materials Began working with DLF & OCLC to develop digital registry Incised palm leaves Saraswathi Mahal Library

35 Pilot Shipment to India Questions to be answered –What does it take to pull & pack books for shipment abroad & track their scanning & return? –Should shipments be coordinated among participating U.S. libraries, centralized, or individualized? –How long will the books be away? –Will they be returned safely? –How will the scanning turn out?

36 Explore trade–offs By air is faster, more expensive –Air containers hold 1500 kilos (3300 pounds) By sea is slower, less expensive –Ocean containers hold four times as much Preliminary talk with Emery –Space available on flights to India & China, but little space available on return flights –Suggested shrink–wrap & return books by sea

37 August 2002 Pilot Shipment to India 6,000 books – mostly public domain –243 boxes, 11,298 pounds of books on nine pallets –Shipped from New York to Chennai in a 20–foot ocean container Shipment took 25 days to reach Chennai Round trip cost $2 per book –A third of the books had to be returned to the U.S.

38 Pilot Shipment Book Distribution From Chennai, books went to the central distribution center in Tanjore From Tanjore, books were distributed to scanning centers Pilot Central Distribution Site Deemed University

39 One Year Later... Most of the books were returned in good condition in August 2003 Lost books located & returned 2004 Scanning center in Hyderabad, India

40 Lessons Learned & New Strategies Reduce costs –Packing books in crates costs $1 per book round trip –Books that don’t need to be returned cost just $0.50 Reduce turn around time –Learned customs procedures –Changed distribution strategy Books now go direct from seaport to 4 international megacenters

41 Copyright Permission Dedicated labor beginning November 2003 –Funded by U.C. Merced Libraries Send request letter + prompt follow up call or email Books for College Libraries –50,000 titles –5,600 publishers Scanning center in Hyderabad, India

42 “More Bang for the Buck!” Indigenous Materials in India & China Public Domain In Copyright Shifted from per title to per publisher approach Original Current

43 Request Letter Educate about open access & user behavior Ask for non–exclusive permission to digitize –All out of print, in copyright titles –All titles published prior to a date of their choosing –All titles published # or more years ago –List of titles they provide Assure – follow standards & laws; limit print & save Give – images, metadata, & OCR  $$$

44 Problems Seeking Permission Identifying & locating copyright holders Publishers –Lost or misplaced request letter –Afraid of open access & lost revenue –Don’t know what they published or what rights they have & don’t have the resources to figure it out –Involved in other projects – perhaps exclusively –Copyright reverts to author when book out of print

45 46% 54% 73% 12% 14% Permission granted Permission denied Still negotiating Preliminary Statistics Need 18% success rate with BCL publishers granting permission for 500 books each 371 publishers contacted 14% success rate 46,700+ titles

46 Analysis by Copyright Holder Type Scholarly associations University presses Commercial publishers Authors & estates Other Million Book Project Success rate based on responses

47 Estimated Transaction Costs $ 18,846Labor $ 323Follow up $ 194Paper & postage $ 19,363TOTAL Nov 2003 – Jun 2004 Does not include legal fees, administrator time, or cost of Internet connectivity or database creation.  $0.42 per book 540 letters & 359 follow up calls or email Internet Café, India

48 Experiment: Renewal Databases Catalog of Copyright Entries (Digitized at Carnegie Mellon) –http://onlinebooks.library.upenn.edu/cce/http://onlinebooks.library.upenn.edu/cce/ Library of Congress Copyright Catalog –http://www.copyright.gov/records/cohm.htmlhttp://www.copyright.gov/records/cohm.html –To find renewals 1973–1978 must consult another source for registration numbers Lesk Copyright Renewal Records –http://www.scils.rutgers.edu/~lesk/copyrenew.htmlhttp://www.scils.rutgers.edu/~lesk/copyrenew.html –Functionality created & enhanced by Michael Lesk

49 Experiment: U.S. Office of Copyright Expedite identifying & locating copyright holder –Asked to identify & locate copyright holders of 7 titles –$150 fee – charged 3 days after request –Estimated 4 to 6 weeks –Nudged at 8 weeks –Took 15 weeks to respond –Confirmed one citation Scanning center in Hyderabad, India

50 Experiment: Authors Registry Expedite locating copyright holder –Asked to locate 25 authors or estates –$2.50 fee per author/estate found –Same day response –Found 52% –92% accuracy rate Authors likely to grant permission, but transaction cost per book is high Scanning center in Hyderabad, India

51 Experiment: Workflow Time Trials Need to generate lists of BCL titles for publishers that will consider a list that we provide Expedite generating lists of titles (from dirty OCR) –Verify citation – 30% improvement using digital BCL over WorldCat –Verify copyright status – improvement using Lesk’s enhanced copyright renewal records –Verify print status – not cost effective Reduced cost per title from $2.12 to $1.41

52 Next Steps Partnerships pending –Vendor to provide print–on–demand service –U.S. Office of Copyright to study impact of CTEA Standards & best practices –NISO solicited proposal to develop copyright metadata –Lead best practice for acquiring copyright permission Research –Continue data gathering & analyses –Survey participating publishers’ views of open access

53 2004 Meeting at Carnegie Mellon Partners from India & China 124,000+ books scanned in India, China & Egypt Additional centers planned in India, China, Poland, Turkey, Korea

54 Scanning in China $8.46M from Ministry of Education 2003–2006 45,000 books scanned –Capacity 0.5M pages scanned per day –Want to scan 1M books in 2 years –14 scanning centers 25,000 books from our collection shipped to China July 2004 Scanning center in Beijing

55 Scanning in India $25M annually to support research 70,000 books scanned –Capacity 1.5M pages per day –20 scanning centers –4 are megacenters Cost 0.69 rupees per page –0.89 rupees with cost of scanner –$0.015 – $0.019 U.S. dollars BooksLanguage 45,660English 10,486Telugu 1,114Sanskrit 1,091Tamil 3,454Urdu 220Kamada 475Hindi 636Marathi 7,158Other

56 Scanning in India Many books scanned, but no OCR available Will be done with Indian books in 6 to 12 months Working on quality control Librarians in Hyderabad, India

57 Research Access for visual & hearing impaired Automatic summarization Machine translation OCR Searching User interfaces State Library in Hyderabad, India

58 Sample Translation Telugu to English

59 Current book display & navigation Search within the book Next & previous page Zoom in & out Select format Go to page

60 Proposed book display & navigation Zoom in & out Select format Search within the book Next & previous page Go to page Add or remove bookmark Add to or view bookbag Get info about the book New search for other books Return to results Help Title of book Location within book Navigate book Beginning & end of book

61 Needs More books –India requested 60K books –China wants 10K books per month More scanners More storage Way to transfer content to Carnegie Mellon –DVD burners to be provided Way to standardize the metadata New policy on copyright

62 Copyright Policy Public Domain Enhancement Act HR 2601 –Pay $1.00 to maintain copyright after 50 years –Register copyright agent –U.S. Office of Copyright maintains freely accessible list of titles & agents Options to compensate copyright holders –Compulsory licensing – like sound recordings –Public Lending Right – government pays –Subscription model – like HBO –Metered use – pay per view –Free to read – pay to print Want a global model

63 Lesk Identifying Public Domain Public domain appears to have 5.5M books in English –2M published in U.S. pre–1923 –2M published in U.S. 1923–1963 (90+% not renewed) –1.5M (out of 8M) published in foreign countries 1.5M books in French, German, Spanish & Italian Expect half to be difficult to locate

64 Kahle v Ashcroft 92% of books are in copyright, but out of print Challenge U.S. copyright system –No records of copyright ownership –Denies public access to orphaned works without providing any benefits http://notabug.com/kahle/ –Submit examples of impact of barriers to use of orphaned books

65 Use Internet Explorer The Universal Library, China site http://www.ulib.org.cn/ Digital Library of India http://www.dli.gov.in/home.htm The Universal Library, U.S. site http://www.ulib.org/html/index.html FAQ:http://www.library.cmu.edu/Libraries/MBP_FAQ.htmlhttp://www.library.cmu.edu/Libraries/MBP_FAQ.html

66 Thank you! Denise Troll Covey – troll@andrew.cmu.edutroll@andrew.cmu.edu


Download ppt "Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting."

Similar presentations


Ads by Google