Svein Arne Brygfjeld National Library of Norway Nordic Web Archive.

Slides:



Advertisements
Similar presentations
Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
Advertisements

THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Andrea Fojtu Charles University in Prague, National Library of the CR.
Depositing e-material to The National Library of Sweden.
Digitisation projects and preserving digital documents in Hungary Current trends in digitisation DELOS, Turin, 3-4. febr István Moldován Hungary,
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
UCLA Digital Library Technical Architecture June 13, 2002 UCLA Digital Library Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
The eXtensible Past XML As a Means for Easy Access to Historical Research Data and a Strategy for Digital Preservation.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Chapter 10 Publishing and Maintaining Your Web Site.
Digital Repository Service (DRS) Harvard University Library OIS presented by: Wendy Gogel & Andrea Goethals.
Digital Objects Management Arbicon Visit, June 7, 2007 Esa-Pekka Keskitalo, Senior Analyst esa-pekka.keskitalo [at] helsinki.fi.
Digital Library Architecture and Technology
WebArchiv Czech Web Archive IIPC 2007, Paris.
Introduction to digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Case History: Library of Congress Audio-Visual Prototyping Project METS Opening Day October 27, 2003 Carl Fleischhauer Office of Strategic Initiatives.
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Why Open-Source? No Vendor-Locking In a proprietary software --- Your supports lock with it. freedom to customize and improvements in software needs,
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
Chapter One Orientation: The world of digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digital library projects in the Nordic national libraries Juha Hakala Helsinki University Library – The National Library of Finland.
Challenges for Academic Libraries in the Networked World Christine L. Borgman Professor & Presidential Chair in Information Studies UCLA & Visiting Professor.
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
A Repository of Cultural Heritage Objects: Criteria of Annotation and Archiving Carina Kargl & Elisabeth Steiner Zentrum für Informationsmodellierung –
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
ON-line SERVICES based on DIGITAL DOCUMENTS Prof. Doina Banciu ROCS Bucharest, 2008.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
OWL Representing Information Using the Web Ontology Language.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Session A305 Findability: Information Not Location Mike Creech Web Content Manager Ken Varnum Web Systems Manager University.
DSpace - Digital Library Software
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
IOGENE Project - University of North Texas Open Source Components, Standards Conformance, & UCD:
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
William J Nixon Setting up a Repository. Introduction Key Features to consider (and review) Wide Range of Technology Available –Best fit for purpose –Clear.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Building Digital Archives Mark Phillips Cathy Hartman June 6, 2008.
Web Programming Language
Building A Repository for Digital Objects
CS 501: Software Engineering Fall 1999
GSAF Grid Storage Access Framework
VI-SEEM Data Repository
Archiving and Delivery of Student Portfolios
DIGITAL LIBRARY.
Metadata to fit your needs... How much is too much?
Presentation transcript:

Svein Arne Brygfjeld National Library of Norway Nordic Web Archive

The message of today First: A summary Second: Legal deposit in Norway Third: Our digital library principles Fourth: Harvesting, archiving and giving access to the web Fifth: The prototype, a demonstration

Part one: Summary Norwegian legislation on legal deposit: Includes digital information! The national library of Norway has a relatively advanced digital library activity Nordic cooperation on methods and technology for legal deposit of the web Nordic project on access to web archives

Part Two: Legal deposit in Norway Legislation revised in 1989 Includes all information carriers in the ”traditional domain”, like books, newspapers & more Also including music and broadcast programs And: Including the information living in the digital domain

The National Library of Norway Bendik Rugaas Administration IT & Innovation National Librarian Rana Division Oslo Division 200 employees Administration IT Technical Repository Legal Deposit Media Lab Sound & Image 100 employees Administration IT Public Collections Bibliographic Norwegian Music (Svein Arne) 2

The challenge: Preserving the cultural heritage represented by the world-wide web –Including harvesting and archiving Giving access to historical web archives –…Nordic Web Archive access project

But first: Part three Our digital library principles…

One strategy for most digital objects One large long-term digital repository All storage, long-term preservation and access based on this infrastructure

Our Digital Library reference model -unix servers - fault tolerant disk systems -Tape libraries -HSM -Search Engines -Personalization -Specialized applications -Collecting applications -Metadata (DC) -Identification (URN) -Migration -Quality and Formats -IPR/Copyrights/Access control - text, audio, still images, moving images, web pages & more General storage facility Digital objects Repository functionality & organization Digital Library application layer

Examples of current use Digital Radio Archive –Digitization & archiving of hrs Galleri NOR –Still images in high quality Historical news-papers –Images of pages as well as OCR-based text

And now… …the preservation of the web!

Preserving the web: some focus areas Harvesting & collecting it all Archiving –Identification, versions, metadata, long- term preservation Access to archive

Harvesting Can it be possible? –Have a look at the search engines Available software –Public domain/OpenSource NEDLIB –Commercial several

Harvesting: Resolution in time Snapshots vs continous Continous: –Wanted for services considered interesting and with rapid updates –Dependent on use of software agents placed at the publisher

Everything or bits & pieces Questions to be answered: –What is (technically) possible? –What do we want? –What level of metadata do we need?

Archiving Different models in the five countries (probably) The norwegian model based on use on the library’s general storage facilites Close integration to other digital objects Online or near-line

Long-term preservation Migration –So far our choice Emulation –Technically complicated Museum –Hard to do over time

And now… …access to web archives

Nordic Web Archive A context for cooperation to find common technology and methods to harvest, archive and give access to the web Current focus on access to archives –Small, focused project

NWA: Members Denmark (Royal Library) Finland (National Library) Iceland (National Library) Norway (National Library), project mgmt Sweden (Royal Library) Nordunet2

NWA: Current scope Focus on access to web archives NOT harvesting NOT archiving

NWA: Main choises General and well-specified interface to archive Search (and navigation) through the use of a commercial search engine Access based on search and navigation/browsing Support for navigation in time and space

NWA: Architecture XML COMMON FORMAT INDEXES WEB INTERFACE ARCHIVE ACCESS SEARCH ENGINE INDEXER FIND_DOCUMENT(URN) DOCUMENT FIND_ID (URL,TIME) URN

NWA: The technology Based on commercial search engine from Fast Search & Transfer In-house development on Linux-platform –XML, PHP, Perl and Java –Probably OpenSource –General web user interface (no additional plugins needed)

NWA: Search engine motivations Motivation –Support for search functionality on text documents –Speed –Reduced complexity in implementation

NWA: Search engine benefits (in addition to fullfilling the motivations) –Extreme scalability –Support for distributed searching –Easy integration with other indexes –Integrated language technologies (limited)

NWA: Access methods Main principles: –The web seen in the archive should look like it did on the net –It should be available through the use of a ordinary web browser Three main methods –Search, navigation and browsing

NWA: Search Search based on search engine Indexes based on exports from archives –In general search on the original content is possible, but –Some additional information available Protocol metadata, timestamps and more Time limitations, phrase search and other funtionalities

NWA: Search cont.

NWA: Time navigation Given a location or service –The user should easily be able to go to next/previous version Using a JAVA-based time-line as time navigation tool

NWA: Time navigation cont.

NWA: Space navigation Given a point of time –The user should be able to go some other service based on the url In NWA prototype, the user can use original url’s as reference to service within the archive

NWA: Space navigation

NWA: Metadata Few web recources contain user-produced metadata HTTP contains some metadata, like time of modification and more Tagging of documents (like ) can be viewed as metadata, and is passed on to the indexer

NWA: Open Source? Many good reasons pro, few contra Dependent on third-party software! –Radical re-implementation to be independent

NWA: Scalability Search engine extremely scalable

Further challenges ”The deep web” Dynamic and user dependent services Continuity Description/metadata Access rights to archive! –This is the main obstacle

See also…

That’s it! Thank you for listening (if you were ;-) ) Please contact me if there’s anything –But on only!