Digitization and scientific digital libraries Martin Lhoták Knihovna AV ČR, v. v. i. Academy of Sciences Library UISK, Universita Karlova v Praze
Content Digitization Centre of Acad. of Sci. Library Kramerius – software for dissemination Digital Library of the Academy of Sciences Software for metadata creation „Digitization Registry CZ“ project
Digitization Centre of the AS Library In operation since Builded with support from EU Solidarity fund after floods in Czechia in 2002 Main aim - to build a digital library of scientific publications (books, articles,…), published in the Academy of Science of the Czech Rep. Digital Library of ASCR Partner of DML-CZ: Czech Digital Matemathical Library project since 2005
The Academy of Science of the Czech Republic > 50 scientific institutes 8000 employees, (4000 R&D) > articles, reports, etc. a year publish > 90 journals (circa 3000 articl.) > 100 years history
Digitization Centre of the AS Library 1 x A0 color scanner ProServ ScanTech 600i 1 x A1 color scanner Digibook x A2 bw scanners Zeutschel OS x A4 fast production scan. Panasonic Staff – 8 to 10 people Provides servis also to other institutions Monthly production pages Overall production > pages Planned acquisition – ScanRobot
Image Adjusting Software Book Restorer from i2S Designed to process scanned books Geometrical correction Crop Blur Binarization Despecle
Basic Metadata XML (DTD of The Czech National Library) Title basic biblographic data Book/Journal structure Physical size of the book/journal Numbers of pages Software Sirius (CZ)
OCR Fine Reader runs: - 1. to recognize language of paragraph - 2. to do OCR with right language OCR workflow developed by DML-CZ team of Dr. P. Sojka Output – double layer PDF: - 1. layer scanned picture - 2. layer „OCRed“ text
Kramerius – development group and used technology Open source – development from 2003 Main purpose – accessing/dissemination of digitized documents (monographs and periodicals) Czech National Library, Academy of Sciences Library, Qbizm technologies, Moravian Library in Brno Funded mostly from Ministry of Culture and Academy of Sciences Grant Agency Used technologies: JAVA, Linux, Apache, Tomcat, Postgres SQL, Lucene
Kramerius – current status version: 3.3.0, build: ,
Kramerius – current status DTD for periodicals a monographs Import of XML, TXT and graphic files Grafický formát DjVu, JPG, PNG, PDF Fulltext search (Lucene) Replication of the data between individual instalations OAI-PMH – for metadata harvesting METS, PREMIS, MIX – metadata standards
Kramerius – current status International an national Connections: - The European Library - Uniform Innformation Gateway JIB Links to libraries OPACs Persistent URLs enables persistent linking
Kramerius – new plans of development Fundamental change – use of the FEDORA repository (open source USA) Reasons – FEDORA is robust engine with support of compound objects and it is also usefull by means of long term preservation Enhancement of administration – users and access rights Batch operations with digitized documents New types of docs (maps, audio, video,…)
Kramerius – institutional users Czech National Library, Moravian Library in Brno, State Technical Library, Academy of Sciences Library Regional Scientific Libraries: Havlíčkův Brod, Hradec Králové, Olomouc, Ostrava, Zlín Muzeum Libraries: UPM Praha, ŽM Praha, DA Praha, MVČ Hradec Králové In total circa pages (circa 500 periodical titles amd 4500 monographs)
Academy of Sciences Digital Library Funded by Academy of Sciences ( ) Digitization of historical issues ( ), Digitized circa pages Development of Kramerius system Accesible pages, (no articles separation) Fulltext search
Academy of Sciences Digital Library New issues – different approach Open source E-prints (Uni of Southampton) Agreements with the Academy Institutes – conditions of dissemination Final goal – merge of both digital libraries (solution probably Drupal/FEDORA – Islandora?)
Collaboration with Google Digitized journals from Kramerius system - indexing of fulltexts, automatic detection of articles, link from Google to article’s first page or abstratct New articles in E-prints - indexing of fulltexts, link from Google
Academy of Sciences Central Data Repository Huge amount of data from digitization Disk array 30 TB with mirror Tape library up tp 500 tapes 3 different location for long term storage Long term preservation for R&D outputs of the Czech Academy of Sciences Institutional Repository
System for journal publishing administration Proven professional system (Manusript Central, Editorial Manager) Better price for implementation and every year service fees with purchase as consortium On-line submission system Complete evidence of authors, reviewers and articles Automated administration of peer review Recently 8 journals
Software for metadata creation Project by Moravian Library in Brno – funded by Ministry of Culture Open source which should enable to create metadata in Kramerius format Metadata – descriptive, technical, administrative Bibl. record from library inf. system Outputs also in other formats – Manuscriptorium, Dspace, FEDORA
Project „Digitization Registry CZ“ Project partners: Academy of Sciences Library and National Library Funded by R&D program of Ministry of Culture Central registry of digitized documents in CR Monitoring of digitization workflow Linking with libraries OPACs Possible move to international level (EU project)
Thank you! Questions? Martin Lhoták