Presentation is loading. Please wait.

Presentation is loading. Please wait.

Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005.

Similar presentations


Presentation on theme: "Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005."— Presentation transcript:

1 Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005

2 Vision “To attempt to understand and solve the technical, economic, and social policy issues of providing online access to all creative works of the human race.” – Dr. Raj Reddy

3 What is the Million Book Project? The Million Book Project (MBP) is a worldwide endeavor to digitize and provide full-text searching and free-to-read access to a million books by 2007.

4 Why is this important? To share knowledge and inform citizenry Facilitate new knowledge Enhance student learning and success of faculty research Address copyright absurdities Support digital library research Preserve rare and fragile cultural materials

5 Digital library research initiatives Machine translation Massive distributed database Storage formats Use of digital libraries Distribution and sustainability Security Search engines Image processing Optical Character Recognition (OCR) Language processing Copyright laws

6 Who is involved? Carnegie Mellon University Libraries and the School of Computer Science Other U.S. libraries OCLC, Digital Library Federation, and College & Research Libraries Internet Archive U.N. Food and Agriculture Organization India China

7 Partners Indian Institute of Science  International Institute of Information Technology  Indian Institute of Information Technology  Anna University  Mysore University  University of Pune  Goa University  Tirumala Tirupati Devasthanams  Shanmugha Arts, Science, Technology & Research Academy  Arulmigu Kalasalingam College of Engineering  Maharashtra Industrial Development Corporation Chinese Academy of Science  Chinese Ministry of Education  Fudan University  Nanjing University  Peking University  Tsinghua University  Zhejiang University

8 Partners National Science Foundation 2001$665,600 2002$1,000,000 2003 $1,000,000 2004 $1,000,000 2005$58,500 for equipment and travel

9 Content parameters Balance users’ wants with legality Opportunity-driven, many sub-collections Some content strategies:  Books for College Libraries  Public domain materials  Cultural heritage materials

10 Almost 500,000 books scanned to date 230,000 books in Chinese 100,000 books in Indian languages 140,000 English or western language books Incised palm leaves from the Saraswathi Mahal Library

11 Scanning in India Established 20 scanning centers Have scanned 200,000 books to date Provides above average wages, desirable jobs

12 Scanning in China Established 17 scanning centers, including one in the Shenzhen Free Trade Zone Shenzhen scanning center  Are scanning indigenous materials, public domain works shipped from the U.S., and U. S. copyrighted works already in Chinese libraries (with permission granted)  Provides above average wages, desirable jobs

13 Million Book Project in China Centers scan 1,000 volumes / 200,000 pages daily 270,000 volumes have been scanned to date

14 Data corruption discovered in some test- case books was caused by compressing digital files to transfer data Presently and in the future, rather than compressing files, more disks are used to transfer data Other quality control improvements in the Shenzhen scanning center and North Technical Center in Beijing Quality control improvements

15 Digitization preserves fragile old or ancient books and manuscripts Digitization benefits the worldwide public as well academic communities by sharing knowledge that is otherwise unavailable to citizens Value of digitization

16 Standards and workflow National standards for digital preservation www.imls.gov/pubs/forumframework.htm www.imls.gov/pubs/forumframework.htm National standards for cataloging Documented workflow & training developed and provided by Carnegie Mellon University Libraries

17 Digitization workflow Operators scan, post- process and OCR 600 dpi TIFFs Scan-Fix Abby Fine Reader Technicians capture metadata

18 Sustaining the collection Goal: Ten organizations host collection  Cost per host site is ~$1M per host site  Collection is ~20 terabytes Current host sites:  Digital Library of India  Universal Library, China  Universal Library, Carnegie Mellon  Internet Archive  UC Merced

19 Thank you Gabrielle Michalek, Head of Archives & Digital Library Initiatives, Carnegie Mellon University Libraries, gabrielle@cmu.edugabrielle@cmu.edu


Download ppt "Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005."

Similar presentations


Ads by Google