Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Preservation and Management

Similar presentations

Presentation on theme: "Digital Preservation and Management"— Presentation transcript:

1 Digital Preservation and Management
Welcome Introductions - name, organization, work with born digital materials The majority of information resources available today are created digitally. Only a small percentage of this material will need to be preserved in archives and libraries; however, there are many challenges in assuring that born digital materials are able to be preserved

2 Preserving Digital Resources: Why is it an Issue?
Technology obsolescence Digital media life expectancy Variety of file formats Digital rights management Costs Organizational resistance Obsolescence Hardware and software changes rapidly. Microsoft gives us a new operating system how often? Every 2-3 years? 5 ½ inch diskettes, once very common, are now obsolete. 3 ½ inch diskettes are becoming obsolete. CD’s are being replaced by DVD’s. So, the software environment that a born digital object is created in will change quickly; and the storage media will change as well (although probably not as quickly as the software environment). Digital media life expectancy There are no assurances that the storage media are stable enough to warrant preservation. And, even if they are, the rate of technological change will make them obsolete anyway. Variety and complexity How many variations of a particular file format can there be? MANY. For example, according to one source, there are currently over 3000 different digital file formats in existence. And that number does NOT take into account the various “flavors” of the same file format. It counts .DOC one time – even though MSWord documents created 10 years ago are different from MSWord documents created today. Source: Costs There aren’t any good cost models available to help you determine how much digital preservation will cost. But, we do know that it will be a new form of preservation; one that will require technical infrastructure and management – thus increasing organization’s preservation costs. Organizational resistance to digital preservation as a programmatic and systematic responsibility

3 Assumptions Digital preservation is more challenging and complex than preservation of analog objects Digital preservation is more than a technical preservation strategy “THE” solution doesn’t exist Digital preservation needs to be integrated into organizational culture

4 Assumptions Change Happens File formats matter
Non-proprietary is best; de facto standards are good System architecture and documentation matters Open systems that can be moved to other platforms Technology isn’t the whole solution Policies, planning, and resources The community is just beginning to work on these issues – and everything is new and is changing Change Happens The underlying technical platform – the operating system, for example, is going to change and it will become increasingly different through time to support the original OS. Thus, digital preservation strategies are ongoing. A preservation plan will need to be developed – but it will not necessarily be a long term plan. It may be able to identify preservation actions that need to take place in ten year increments. But probably not longer than that as we don’t know what technology will look like ten years out. The work will never be done. But we do need to find ways to minimize the work. This is one of the biggest challenges of digital preservation – we know it is an ongoing commitment, technically challenging – but we have to do it with the minimum amount of work (time, money, technology) And, we must balance this minimalism with what amount of work yields the greatest benefits.

5 Terms Digital Object: Any resource that can be stored or manipulated by a computer Digitized Resources: Any resource that has been digitized from an analog source Born Digital: Any resource that was created digitally and will be managed and preserved digitally Read definitions. Note that this workshop applies to born digital, not digitized material. HOWEVER, the technical and preservation issues are very similar, if not the same, for digitized resources. Ask audience for list of types of born digital material that they are aware of , web sites, documents, databases, systems, spreadsheets, digital audio, digital video, scientific data, etc. Ask audience if any of them are acquiring these types of resources currently

6 Terms Digital preservation/archiving: Storage, maintenance, and access to a digital object over the long term, usually as a consequence of applying one or more preservation strategies Digital Preservation encompasses a broad range of activities designed to extend the usable life of machine-readable computer files and protect them from media failure, physical loss, and obsolescence.

7 Terms Viability: maintenance of the bitstream
Renderability: viewable by humans and “processable” by computers Understandability: interpretable by humans Fixity: The state or quality of being fixed or unchanged. Reliability: the digital objects are created in a trustworthy way. They are what they say they are Authenticity: the digital object remains reliable over time TDR divides digital preservation activities into those that promote the long-term maintenance of a bitstream (the zeros and ones) and those that provide continued accessibility of its contents. The OCLC/RLG Working Group on Preservation Metadata added the concept of viability to the maintenance of the bitstream, indicating that information must be intact and readable from the storage media, and further subdivides the content accessibility need into renderability (viewable by humans and processible by computers) and understandability (interpretable by humans). As these terms imply, it is one thing to preserve a bitstream, but quite another to preserve the content, form, style, appearance, and functionality. We conceive of digital preservation as a process that requires the use of the best available technology as well as carefully thought out administrative policies and procedures. Fixity The state or quality of being fixed or unchanged. Since digital objects are easily modified, a mechanism is necessary to maintain fixity over time, or to consciously document when a digital object has been altered. Technologies such as checksums and digital signatures are used to verify that a digital object retains its fixity, which helps maintain the object's authenticity and integrity. Fixity information is a key part of metadata for digital preservation.

8 Digital Preservation Strategies
Bitstream Copying Refreshing Durable/Persistent Media Technology Preservation Digital Archaeology Analog Backups Migration Replication Reliance on Standards Normalization Canonicalization Emulation Encapsulation Universal Virtual Computer Many digital preservation strategies have been proposed, but no one strategy is appropriate for all data types, situations, or institutions. Handout in packet with brief explanation of each of these.

9 Trusted Digital Repositories
A repository whose mission is to provide reliable, long term access to managed digital resources to a community, now and in the future. We need authentic, reliable, trustworthy data – but we also need reliable and trustworthy repositories for that data

10 Trusted Digital Repositories
Attributes Administrative responsibility Organizational viability Financial sustainability Technological suitability System security Procedural accountability OAIS compliant Seven attributes of a Trusted Digital Repository. Notes will discuss each one when they’re done!

11 Trusted Digital Repositories
Implementation approaches will vary Approach will depend on: Context Users (designated community) Underlying issue remains constant Functionality Reliability and authenticity

12 Open Archival Information System (OAIS) Reference Model
Conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term Consists of people and systems (overview) (standard) One frequent lament heard about the OAIS is that it is a reference model rather than an implementation plan. By design, it is a conceptual framework for a complete, generic archival system. Its strength is in establishing common terms and concepts for describing repository architectures and comparing implementations without specifying an implementation an organization should use. Because of this flexibility, the reference model has been utilized by a variety of organizations for planning and designing digital repositories of many different types. a type of archive consisting of an organization of people and systems that has accepted the responsibility to preserve information for one or more designated communities. Great ppt tutorial from 2003 Library of Congress presentation – link available from the overview URL Increase awareness and understanding of concepts relevant to digital preservation Clarify terms and concepts Create consensus on elements and processes involved in digital preservation and access Create a framework to guide the identification and development of standards

13 OAIS: What is it? Any organization or system charged with the task of preserving information over the long term and making it accessible to a specific group of users An OAIS archive is expected to meet certain minimum responsibilities Note that an OAIS is a model for any type of archive – a traditional, physical, organizational archive or a digital archive. The use of the word “Open” refers to the fact that the model and future recommendations associated with it are developed in open forums – it does NOT have anything to do with the level of accessibility to materials within an archive. Minimum requirements on next slide

14 OAIS: Minimum Responsibilities
Negotiate and accept appropriate information from information creators Obtain sufficient control over the information to ensure preservation Determine the scope of the “Designated Community” (the users) Ensure that users can understand the information without assistance from the information creators Minimum requirements of an OAIS archive (this slide and next slide) Read these – explanation will follow

15 OAIS: Minimum Responsibilities
Follow documented policies and procedures Ensure preservation Authenticate information Disseminate (provide access to) information Make the information available to the Designated Community Minimum requirements of an OAIS archive Read these – explanation will follow Explanation/translation on next slide

16 Preservation Planning
Monitoring technology and users; developing preservation actions Preservation planning is part of the administration functions of any archival program; OAIS has highlighted it as a distinct function Importance of constant and ongoing management and planning for digital preservation call for this

17 Components of a Digital Preservation Program
TDR and OAIS imply that there are three components of a digital preservation program Resources Framework (trust) Organizational Infrastructure (policy) Technological Infrastructure (technology) Words in parentheses refer back to the slide on authenticity – where these three things are mentioned as the way to ensure authentic records.

18 Resource Framework Nothing is sustainable without ongoing commitment of resources A high level commitment to digital preservation must demonstrate an adequate resource commitment Deliverables that meet the goals Line item budgets Staff commitment Strategic planning Projections for costs and funding scenarios

19 Resource Framework Commitment of resources (time, money, staff) implies organizational commitment and reflects organizational priorities Staffing is the expensive part! Curatorial functions Appraising, acquiring, processing, metadata creation, ongoing management, access Technical functions Computer operation, system administrator, database administrator, storage administrator, application programmer, preservation expertise Computer Operations - performs backups, tape rotations, hardware maintenance, monitors availability Computer System Administrator - appropriate to the platform, NT or UNIX, may do network setup Database Administrator - typically repository systems run a relational database for metadata and/or content storage Storage Administrator - may get involved for large amounts of data since data is generally network-attached Applications Programmer - for installation of repository software & upgrades, trouble-shooting, performance, possible system modifications Preservation Officer - works with technical staff, manages rollouts of new functionality, performs communication tasks

20 Planning Identify stakeholders and their roles Educate
All partners need a desired outcome Tangible or intangible Buy-in Mission, goals, outcomes

21 Organizational Infrastructure
Organizational and Curatorial Responsibilities Policy framework Operational Responsibilities Planning framework Functions and roles This unit extends the “attributes” of a trusted digital repository; to responsibilities. The policy and procedures that the organization needs to implement in order to support the technical infrastructure an archive is more than storage space! The organizational infrastructure will be somewhat dependent on the technological infrastructure – planning for developing these needs to be parallel

22 Organizational and Curatorial Responsibilities – Policy Framework
Strategic Plan Collection Policy Security Policy Preservation Policy Access Policy These are familiar things to us. We will focus on how each of the considerations that are specific to digital preservation.

23 Strategic Plan Overview and scope of the digital preservation program and its context Mission/Purpose High level goals and objectives Commitment to OAIS and community best practices Related documentation and who is responsible Administrative/Oversight structure High level audience statement

24 Audience (Designated Community)
OAIS requirement Explicit All collections Per collection Audience=assumed knowledge and resources

25 Impacts of Audience Identification
The kinds of collections you will accept The kind of descriptive information (metadata) you will provide The kind of services you will offer Software, translators The kind of preservation actions chosen Significant properties The access mechanisms you need to provide

26 Collection Policy What kinds of digital resources are you going to collect and digitally preserve? Content considerations Are you focusing on a specific content area? Rights management considerations Metadata responsibilities and requirements Requirements for documenting acquisitions

27 Collection Policy Technical considerations
Digitization with no physical counterpart Digitization with a physical counterpart Anything born digital Born digital that can’t be reformatted to eye readable

28 Collection Policy Are there further limitations on what you will collect? (examples) Non-proprietary formats only Specific formats only (TIFF) Systems/databases only Distinct documents only Minimum amount of metadata required at time of acquisition Materials that can be digitally reformatted in a specific way Move everything to TIFF? Move everything to XML?

29 Documenting Acquisitions
OAIS requires agreements with depositors that address acquisition, maintenance, access and withdrawal Should already be using these kinds of agreements May need to revise for digital materials, to include What happens if functionality is lost? Is reformatting to eye readable an acceptable preservation option? What kind of access can you provide and is it acceptable? Are there digital-specific copyright issues to consider?

30 Documenting Acquisitions
May need to revise for digital materials, to include Metadata creation responsibilities Rights management What level of functionality will be available from the digital repository?

31 Security Policy System security Physical environment
Backup and recovery Fixity of the data (reliability) Disaster preparedness and response Planning and documentation requirements Assign responsibility

32 Preservation Policy Commitment to digital preservation
Goals of digital preservation Scope of materials Formats Metadata suppliers Access commitments

33 Preservation Policy Definition of overall preservation strategy
Are there limitations? What happens if preservation actions go wrong? Is reformatting to eye-readable an acceptable preservation action? Under what circumstances? Planning and documentation requirements Responsibilities assigned

34 Operational Responsibilities
Based on work done by OAIS community to define the principle obligations of an OAIS compliant repository Appropriate planning documentation will be necessary to carry out operations Specific planning based on strategic plan and policies

35 Operational Responsibilities
Acquisition Physical and intellectual control Determines audience (designated community) Follows policies and procedures to assure preservation of authentic information Access Promotes development of best practices and standards Won’t cover designated community again

36 Acquisition Development of collection policies
Includes specific required formats, if appropriate Procedures and workflows for copyright clearance for access and preservation Metadata specifications and implementation Procedures to ensure the authenticity of submitted material Assessment of the completeness of the submission Documentation of all acquisition transactions

37 Control Preparing the materials for storage Content analysis
Significant properties Verification of metadata Unique and persistent identifier assigned Authenticity and integrity check Move to archival storage

38 Preservation Actions Monitoring of technology and the digital materials Technology watch Preservation planning Classes of material Actions to be taken Documentation of actions and results Functionality considerations

39 Access A system for resource discovery
Mechanism for authenticity check Access control mechanisms User support

40 Standards and Best Practices
Promote and utilize Results in economies of scale Creation of high quality digital resources that are more amenable to preservation Work with software suppliers, potential depositors, designated communities

41 In-house Significant investment Technical expertise Workflow impacts
Maintain physical control Building a digital repository inhouse will require a significant investment. You will need appropriate technical infrastructure and expertise. Even if you can use tools like Dspace or LOCKKS – there is still cost involved and expertise required. And, there will be more work to do – for the technologists and the information professionals. You will maintain physical control over your material.

42 Outsource Can the service provider meet your needs and requirements?
Less investment? No cost models to show if this is accurate Less reliance on in-house technical expertise and infrastructure necessary What happens if the service provider goes out of business? There are some, not many, service providers who you can use to outsource digital preservation. Research them carefully – traditionally, our community has been comfortable outsourcing many things – but maintaining the archives isn’t one of them. We don’t know if it will cost more or less to outsource – no cost models exist. But, it will certainly depend on your circumstances and resources.

43 Combination Build what you can
Build what you need that can’t be outsourced Buy what you can’t build Now, digital repositories… Possibly the best alternative now and in the future will be to build what you can – for example, implement Dspace to support some of your digital preservation needs and use the resources you already have (staff, expertise, hardware, etc) – and buy what you can’t support on your own. Or, another way of looking at it is to outsource what you can; and build a digital repository for materials that can’t be supported using service providers. Whatever combination works for you!

44 OAIS Metadata Implications
Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects. Standards and best practices are developed to promote the creation of metadata to it supports interoperability and collaboration. Metadata sets Metadata encoding schema

45 Types of Metadata Descriptive Technical Structural Administrative
Preservation types of metadata Discuss how each relates to preservation Descriptive – the information that describes the content of the digital resource. Dublin Core is a common descriptive metadata schema Technical – the technical information that facilitates sustainability of the object. Structural -- Documents the relationship between files and the content of the files that make up a complex digitized object Preservation – information necessary to manage digital objects within a digital repository

46 Metadata Each type of metadata will be needed to facilitate the preservation and usability of born digital material Use standards and best practice metadata sets Think interoperability Technologically Element sets We will need each type of metadata to facilitate preservation and usability. Descriptive metadata will be used to help users identify what resources they need. Technical and structural metadata will facilitate preservation. And Preservation metadata will manage preservation activities. All types are vital to preservation

47 Immediate Actions Get Your Team Together Identify your needs
Do you really need a digital repository right NOW? Is there an interim solution until the field is more settled? Agree on vision and goals Plan Back at the beginning we talked about stakeholders and planning. You can do these things now. Whether it is an “in-house” repository or a collaborative effort; you can begin getting the people in place, and begin planning.

48 Immediate Actions Discuss strategy Communication
Any institutional repository depends on a relationship with IT staff Priorities Language barriers

49 Immediate Actions Identify the organizational infrastructure changes that need to be made Investigate existing tools and digital repositories Learn and experiment with existing tools Make high level decisions What kind of digital materials are we going to commit to preserving? Remember that selecting digital resources for preservation is, in my opinion, the most critical task. Make sure the material you are committing to preservation: Is worth the cost of digital preservation Is technically compatible with digital preservation I suspect that many of the early implementors of Dspace will end up with an enormous amount of digital data that was never selectively selected – and will end up being reformatted to eye readable media or deleted. This is one of the pitfalls that is occurring as “archiving” becomes more mainstream – as it seems like something that anyone can do. Appraisal and selection is being overlooked – because storage is cheap and people think that we should just “save everything”

50 Immediate Actions Funding Inventories of digital resources
Establish metadata standards and practices Identify and understand users

51 Take Home Concepts Use standards and best practices
The solution is complex; the tools are incomplete Organizational and technological challenges Learn about what others are doing and build on it Don’t reinvent the wheel Digital preservation is complex The time is right for solutions – there’s a lot of data out there, and there are going to be a lot of issues with preservation very soon. The market for these tools is coming – Ken Thibodeau agreed with this. Selectively select

52 Take Home Concepts Resources are the issue People, not computers!
Expect and plan for change This is all a work in progress “First generation” technologies, tools, understanding of issues You will redo work Digital preservation is complex The time is right for solutions – there’s a lot of data out there, and there are going to be a lot of issues with preservation very soon. The market for these tools is coming – Ken Thibodeau agreed with this. Selectively select

53 Existing Tools Some existing digital repositories – this is not all of them!!! A very select few

54 Tools Technical tools Interfaces, infrastructure and technologies that allow you to do the work necessary to create, manage and preserve digital resources Examples might include: Metadata creation File format verification Algorithms for fixity checks Appraisal/processing tools Access tools – indexing, finding aids, etc. Acquisition tools

55 Tools Few currently exist Options Wait Build your own
Modify existing tools Use what there is

56 Tools DSpace FedoraTM LOCKSS Greenstone OCLC Digital Archive

57 DSpace A specialized content management system that:
manages and distributes digital items allows for creation, indexing and searching of metadata supports long term preservation of material designed to make submission and administration easy

58 DSpace Developed by MIT and Hewlett Packard
Based on freely available software can use proprietary software as well with minor modifications Customizable Academic community is especially active in the use of this implementation UNIX based; written in Java How is DSpace different from other digital repositories? DSpace is the first digital repository to address the myriad issues inherent in a multi-disciplinary archive, including: Differing policies, practices and cultures established by individual disciplines The variety of digital formats produced in today’s multi-media research environments The complexity of metadata standards needed to accommodate and maintain access to the digital formats supported by the system DSpace is designed with a flexible storage and retrieval architecture adaptable to a multitude of data formats and distinct research disciplines, known as “communities.” Each community has its own customized user portal that can use the community’s own practices and terminology. Is DSpace free? Yes. The DSpace system is freely available as open-source software (see locations below), under the terms of the BSD distribution license. We have also tried to find good open-source tools to package with the DSpace application, all freely available under an open-source license (although not all the same license as the one for DSpace itself), so that you get a complete system along with the part that we created. Can I change the DSpace system? Yes. The BSD distribution license for the DSpace system allows you to make any changes you like to the copy of the system at your organization. DSpace was designed to make adapting it for individual organizations as easy as possible. See our Development Guidelines for information on submitting code changes to DSpace. In fact, several modules in DSpace will probably need to be customized for a given organization (e.g., authorization and authentication), and some organizations may want to substitute the open-source tools supplied with DSpace with different ones (e.g., replace postgreSQL with mySQL or Oracle). At the moment, substituting a different RDBMS for postgreSQL will require a few changes to the system’s Browse module. There is documentation available explaining these interdependencies where they exist.


60 DSpace No support available
Preservation is done locally and is not inherent in the system Downloads and specific information at Dspace Demo - MIT Press Note that there is no formal technical support – but there are listservs where implementors discuss issues and ask questions. Search for “technology” to show user interface. Note metadata, show complete metadata record. Cannot link to actual resource

61 FedoraTM Flexible Extensible Digital Object and Repository Architecture “An Open-Source Digital Repository Management System” – the architectural underpinning or plumbing Used to support institutional repositories, digital libraries, content management, digital asset management, scholarly publishing, and digital preservation Dublin Core based metadata OAI compliant Can handle up to 1 million objects using open source software Supports batch uploading of groups of digital objects Focus is on the storage and management of content; not on workflows and interfaces

62 FedoraTM Cornell and University of Virginia, funded by Mellon
Freely available Based on open source software and web based technologies Limited interfaces Management Access Access Lite Management API (API-M) – defines an interface for administering the repository. It includes operations necessary for clients to create and maintain digital objects and their components. API-M is implemented as a SOAP-enabled web service. Access API (API-A) – defines an interface for accessing digital objects stored in the repository. It includes operations necessary for clients to perform disseminations on objects in the repository and to discover information about an object using object reflection. API-A is implemented as a SOAP-enabled web service. Access-Lite API (API-A-Lite) – defines a streamlined version of the Fedora Access Service that is implemented as an HTTP-enabled web service.

63 FedoraTM Architectural Model

64 FedoraTM Installs on Windows PC Packaged to get up and running quickly
Demo set of objects Scales with hardware in a production environment No support available Plumbing only; no inherent preservation Downloads and information available at

65 LOCKSS Lots of Copies Keeps Stuff Safe
To safeguard web journals libraries subscribe to Mimics the way libraries manage paper collections Redundant, distributed, decentralized Library installs LOCKS on local computer with Internet connectivity Library tells computer which journal(s) to preserve and how often to look for new content LOCKSS gets new content and stores a copy locally (“caching”)

66 LOCKSS Works only for HTTP/HTML standard file types (html, jpeg, gif, pdf, etc) Open source code It can be modified Designed to be low cost, low time Will run on a dedicated PC PC specs available on the LOCKSS site Does LOCKSS work for general Web content? No. LOCKSS requires that the content of URLs be mostly static, that the web site have a logical structure and that the HTTP headers permit caching. Does LOCKSS solve the Digital Library problem? No. In addition to the restrictions to static URLs and logical site structure it works only for HTTP/HTML and related formats such as GIF and JPEG. Thus the LOCKSS project avoids addressing most of the hard (and important) issues for digital libraries. Is LOCKSS open source? LOCKSS is an Open Source project. The source and binaries are freely available via We expect that because LOCKSS is open source software, there will be many different groups modifying and using the software. Each instantiation of the software will work best if it can communicate with all LOCKSS systems, so there will be pressure to keep the LOCKSS communication protocol interoperable. However, there will also be many reasons why various communities will want to modify LOCKSS for their own purposes. This diversity is valuable as it reduces the possibility that a single type of failure could affect all the caches simultaneously.  See also, Advantages of open source software. Can a small PC running LOCKSS really serve a large reader community? Access to STM journals drops off very rapidly as the materials age. LOCKSS is intended to operate when the content is so old that it is no longer available from the publisher. There will therefore be very little reader access to the material, so a large powerful computer is not necessary.


68 LOCKSS Publishers can prevent LOCKSS from caching their content
Publishers must give libraries permission Licensing language available on the LOCKSS web site Freely available No support (ease of use is highlighted) Preservation is not inherent Do journal publishers permit copies to be cached? Yes, publishers do permit copies to be cached. Caching is a routine mechanism of the web. In particular, ISPs outside the US routinely cache content to control traffic and to minimize bandwidth constraints. Caching publisher’s material to insure access long term is different from loading content to serve the full text locally. The publisher’s we have consulted are very supportive of this software with the following caveats: they want readers to access the content from their site while their site is available or they want the system to give them anonymous reader behavior information. By and large however, the intended use for these caches are for those instances when content from the publisher’s server is unavailable. Many of the Stanford University Libraries HighWire Press affiliated publisher’s support this work. Do journals use HTTP headers to prevent caching? HTTP defines headers that can be used to prevent caching or restrict the lifetime of pages in caches. Highwire’s journals do not use these headers, so unlimited caching is permitted. LOCKSS is based on the Squid cache software and will obey any cache control headers it finds. Journal publishers who wish to prevent LOCKSS caching their content can use these headers to do so. PRODUCTION IN EARLY 2004 (not 1994)

69 Greenstone A suite of software for building and distributing digital library collections Produced by the New Zealand Digital Library Project at the University of Waikato Developed and distributed in cooperation with UNESCO and the Human Info NGO. Open-source, multilingual software, issued under the terms of the GNU General Public License.

70 Greenstone “Should in fact work on any Windows or Unix system.”
“Local library” “Web library” Greenstone Librarian Interface The “Organizer” “Local library” Built in web server Only available for Windows platforms “Web library” Requires external web server Greenstone Librarian Interface (GLI) I graphical tool for building new collections altering or deleting existing collections exporting existing collections to stand-alone CD-ROMs import or assign metadata an interactive collection design module The “Organizer” Windows utility used for automatically generating some of the configuration files (metadata.xml, sub.txt etc.) used by complex Greenstone collections




74 Greenstone Documentation is available Installer's Guide
Developer's Guide Paper to Collection Inside Greenstone Collections MG/MG++ Workshops are also held Listservs for implementors Some technical support available Not preservation oriented

75 OCLC Digital Archive Standards based OAIS compliant
METS encoded dissemination packages Phased support for various formats and material type Currently text and still image Can integrate with current library selection and cataloging activities Content owner manages the archived objects and determines access Known costs Offers bit preservation

76 OCLC Digital Archive Functions
Harvest from web preview and review Metadata creation Ingest From web or batch Access management public or restricted Viewing Dissemination Reports Periodic Audits of Objects in the Archive Frequent Backups and Disaster Prevention The primary focus of the Digital Archive (currently) is capturing web-based documents, storing them, and preserving them for the long term. The current customer base of the Digital Archive consists mainly of State Libraries and Government Agencies. For example, The State Library of Connecticut and GPO, the Government Printing Office. The Digital Archive supports Metadata Creation for each Digital Object and Content Management. Metadata is XML encoded. It Provides Discovery of archived objects in a variety of ways, including OCLC’s Connexion and FirstSearch services and local Catalogs. As you can see, the functions include Harvest (process of capturing document from web), Ingest, Content Group Management, Rights Group Management, Viewing, Dissemination, Reports and Periodic Audits and Backup/Disater Prevention Services. Our Dissemination Packages are based on METS.

77 Digital Archive Web Services
General steps: General steps in the process: 1. Identify material to preserve Create preservation metadata (using metadata mapped from bibliographic record in WorldCat) Harvest, or copy, the digital object (a single web document) Assign administrative properties to the object Ingest, or add, the object into the Archive Access your now-archived object from the WorldCat record either on the web or via a library catalog. The Digital Archive Services are shown here. As mentioned before, most of these services are made available through OCLC’s Connexion interface. In addition to the Connexion interface, users with Admin privileges will access the Digital Archive Admin service directly. The Admin Service allows users to organize objects into Content Groups, create and assign Rights Groups to objects and create Rights Statements. System Processes include an Auditor which runs period checks on the integrity of objects in the archive. We are currently working on an Annotator which extracts technical metadata from images and stores that metadata.

78 End User Access One an object has been ingested into the Digital Archive, there are a variety of ways for users to discover them. Since each Digital Object in the Archive also has an associated Bibliographic Record in WorldCat, any means of searching WorldCat can be used to discover Digital Archive content. OCLC’s FirstSearch and Local OPACs fall into this category. The bib record associated with a Digital Archive Object will contain a link to that object in the Digital Archive repository. Links to Digital Archive objects can also be embedded on web pages.

79 OCLC Digital Archive Development
Preservation policy and plans in progress Expanding formats and object types accepted Active in development of preservation metadata standard and will comply Active in developing digital repository certification Additional information available at: If you would like more information about the OCLC Digital Archive, Please visit our web site at Also, please feel free to Pam, Shweta and I with any questions or ideas. In conclusion, I want to say that the Digital Archive staff is committed to continue to develop the archive based on standards and evolve our dissemination process to keep up with those standards. Certainly, METS will continue to be a major part of that effort. Thank you very much for your time.

80 Other Tools Australian PANDAS-PANDORA CONTENTdm (content management)
SDSC Data Grid Technology Web harvesting tools E-records management software Document management systems Data warehousing technology XML parsing tools SDSC and others

Download ppt "Digital Preservation and Management"

Similar presentations

Ads by Google