Presentation on theme: "Digital Preservation and Management"— Presentation transcript:
1Digital Preservation and Management WelcomeIntroductions - name, organization, work with born digital materialsThe majority of information resources available today are created digitally. Only a small percentage of this material will need to be preserved in archives and libraries; however, there are many challenges in assuring that born digital materials are able to be preserved
2Preserving Digital Resources: Why is it an Issue? Technology obsolescenceDigital media life expectancyVariety of file formatsDigital rights managementCostsOrganizational resistanceObsolescenceHardware and software changes rapidly. Microsoft gives us a new operating system how often? Every 2-3 years? 5 ½ inch diskettes, once very common, are now obsolete. 3 ½ inch diskettes are becoming obsolete. CD’s are being replaced by DVD’s. So, the software environment that a born digital object is created in will change quickly; and the storage media will change as well (although probably not as quickly as the software environment).Digital media life expectancyThere are no assurances that the storage media are stable enough to warrant preservation. And, even if they are, the rate of technological change will make them obsolete anyway.Variety and complexityHow many variations of a particular file format can there be? MANY. For example, according to one source, there are currently over 3000 different digital file formats in existence. And that number does NOT take into account the various “flavors” of the same file format. It counts .DOC one time – even though MSWord documents created 10 years ago are different from MSWord documents created today.Source:CostsThere aren’t any good cost models available to help you determine how much digital preservation will cost. But, we do know that it will be a new form of preservation; one that will require technical infrastructure and management – thus increasing organization’s preservation costs.Organizational resistance to digital preservation as a programmatic and systematic responsibility
3AssumptionsDigital preservation is more challenging and complex than preservation of analog objectsDigital preservation is more than a technical preservation strategy“THE” solution doesn’t existDigital preservation needs to be integrated into organizational culture
4Assumptions Change Happens File formats matter Non-proprietary is best; de facto standards are goodSystem architecture and documentation mattersOpen systems that can be moved to other platformsTechnology isn’t the whole solutionPolicies, planning, and resourcesThe community is just beginning to work on these issues – and everything is new and is changingChange HappensThe underlying technical platform – the operating system, for example, is going to change and it will become increasingly different through time to support the original OS.Thus, digital preservation strategies are ongoing. A preservation plan will need to be developed – but it will not necessarily be a long term plan. It may be able to identify preservation actions that need to take place in ten year increments. But probably not longer than that as we don’t know what technology will look like ten years out. The work will never be done.But we do need to find ways to minimize the work. This is one of the biggest challenges of digital preservation – we know it is an ongoing commitment, technically challenging – but we have to do it with the minimum amount of work (time, money, technology)And, we must balance this minimalism with what amount of work yields the greatest benefits.
5TermsDigital Object: Any resource that can be stored or manipulated by a computerDigitized Resources: Any resource that has been digitized from an analog sourceBorn Digital: Any resource that was created digitally and will be managed and preserved digitallyRead definitions.Note that this workshop applies to born digital, not digitized material. HOWEVER, the technical and preservation issues are very similar, if not the same, for digitized resources.Ask audience for list of types of born digital material that they are aware of, web sites, documents, databases, systems, spreadsheets, digital audio, digital video, scientific data, etc.Ask audience if any of them are acquiring these types of resources currently
6TermsDigital preservation/archiving: Storage, maintenance, and access to a digital object over the long term, usually as a consequence of applying one or more preservation strategiesDigital Preservation encompasses a broad range of activities designed to extend the usable life of machine-readable computer files and protect them from media failure, physical loss, and obsolescence.
7Terms Viability: maintenance of the bitstream Renderability: viewable by humans and “processable” by computersUnderstandability: interpretable by humansFixity: The state or quality of being fixed or unchanged.Reliability: the digital objects are created in a trustworthy way. They are what they say they areAuthenticity: the digital object remains reliable over timeTDR divides digital preservation activities into those that promote the long-term maintenance of a bitstream (the zeros and ones) and those that provide continued accessibility of its contents. The OCLC/RLG Working Group on Preservation Metadata added the concept of viability to the maintenance of the bitstream, indicating that information must be intact and readable from the storage media, and further subdivides the content accessibility need into renderability (viewable by humans and processible by computers) and understandability (interpretable by humans). As these terms imply, it is one thing to preserve a bitstream, but quite another to preserve the content, form, style, appearance, and functionality. We conceive of digital preservation as a process that requires the use of the best available technology as well as carefully thought out administrative policies and procedures.Fixity The state or quality of being fixed or unchanged. Since digital objects are easily modified, a mechanism is necessary to maintain fixity over time, or to consciously document when a digital object has been altered. Technologies such as checksums and digital signatures are used to verify that a digital object retains its fixity, which helps maintain the object's authenticity and integrity. Fixity information is a key part of metadata for digital preservation.
8Digital Preservation Strategies Bitstream CopyingRefreshingDurable/Persistent MediaTechnology PreservationDigital ArchaeologyAnalog BackupsMigrationReplicationReliance on StandardsNormalizationCanonicalizationEmulationEncapsulationUniversal Virtual ComputerMany digital preservation strategies have been proposed, but no one strategy is appropriate for all data types, situations, or institutions.Handout in packet with brief explanation of each of these.
9Trusted Digital Repositories A repository whose mission is to provide reliable, long term access to managed digital resources to a community, now and in the future.We need authentic, reliable, trustworthy data – but we also need reliable and trustworthy repositories for that data
10Trusted Digital Repositories AttributesAdministrative responsibilityOrganizational viabilityFinancial sustainabilityTechnological suitabilitySystem securityProcedural accountabilityOAIS compliantSeven attributes of a Trusted Digital Repository. Notes will discuss each one when they’re done!
11Trusted Digital Repositories Implementation approaches will varyApproach will depend on:ContextUsers (designated community)Underlying issue remains constantFunctionalityReliability and authenticity
12Open Archival Information System (OAIS) Reference Model Conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long termConsists of people and systems(overview)(standard)One frequent lament heard about the OAIS is that it is a reference model rather than an implementation plan. By design, it is a conceptual framework for a complete, generic archival system. Its strength is in establishing common terms and concepts for describing repository architectures and comparing implementations without specifying an implementation an organization should use. Because of this flexibility, the reference model has been utilized by a variety of organizations for planning and designing digital repositories of many different types.a type of archive consisting of an organization of people and systems that has accepted the responsibility to preserve information for one or more designated communities.Great ppt tutorial from 2003 Library of Congress presentation – link available from the overview URLIncrease awareness and understanding of concepts relevant to digital preservationClarify terms and conceptsCreate consensus on elements and processes involved in digital preservation and accessCreate a framework to guide the identification and development of standards
13OAIS: What is it?Any organization or system charged with the task of preserving information over the long term and making it accessible to a specific group of usersAn OAIS archive is expected to meet certain minimum responsibilitiesNote that an OAIS is a model for any type of archive – a traditional, physical, organizational archive or a digital archive.The use of the word “Open” refers to the fact that the model and future recommendations associated with it are developed in open forums – it does NOT have anything to do with the level of accessibility to materials within an archive.Minimum requirements on next slide
14OAIS: Minimum Responsibilities Negotiate and accept appropriate information from information creatorsObtain sufficient control over the information to ensure preservationDetermine the scope of the “Designated Community” (the users)Ensure that users can understand the information without assistance from the information creatorsMinimum requirements of an OAIS archive (this slide and next slide)Read these – explanation will follow
15OAIS: Minimum Responsibilities Follow documented policies and proceduresEnsure preservationAuthenticate informationDisseminate (provide access to) informationMake the information available to the Designated CommunityMinimum requirements of an OAIS archiveRead these – explanation will followExplanation/translation on next slide
16Preservation Planning Monitoring technology and users; developing preservation actionsPreservation planning is part of the administration functions of any archival program; OAIS has highlighted it as a distinct functionImportance of constant and ongoing management and planning for digital preservation call for this
17Components of a Digital Preservation Program TDR and OAIS imply that there are three components of a digital preservation programResources Framework (trust)Organizational Infrastructure (policy)Technological Infrastructure (technology)Words in parentheses refer back to the slide on authenticity – where these three things are mentioned as the way to ensure authentic records.
18Resource FrameworkNothing is sustainable without ongoing commitment of resourcesA high level commitment to digital preservation must demonstrate an adequate resource commitmentDeliverables that meet the goalsLine item budgetsStaff commitmentStrategic planningProjections for costs and funding scenarios
19Resource FrameworkCommitment of resources (time, money, staff) implies organizational commitment and reflects organizational prioritiesStaffing is the expensive part!Curatorial functionsAppraising, acquiring, processing, metadata creation, ongoing management, accessTechnical functionsComputer operation, system administrator, database administrator, storage administrator, application programmer, preservation expertiseComputer Operations - performs backups, tape rotations, hardware maintenance, monitors availabilityComputer System Administrator - appropriate to the platform, NT or UNIX, may do network setupDatabase Administrator - typically repository systems run a relational database for metadata and/or content storageStorage Administrator - may get involved for large amounts of data since data is generally network-attachedApplications Programmer - for installation of repository software & upgrades, trouble-shooting, performance, possible system modificationsPreservation Officer - works with technical staff, manages rollouts of new functionality, performs communication tasks
20Planning Identify stakeholders and their roles Educate All partners need a desired outcomeTangible or intangibleBuy-inMission, goals, outcomes
21Organizational Infrastructure Organizational and Curatorial ResponsibilitiesPolicy frameworkOperational ResponsibilitiesPlanning frameworkFunctions and rolesThis unit extends the “attributes” of a trusted digital repository; to responsibilities.The policy and procedures that the organization needs to implement in order to support the technical infrastructurean archive is more than storage space!The organizational infrastructure will be somewhat dependent on the technological infrastructure – planning for developing these needs to be parallel
22Organizational and Curatorial Responsibilities – Policy Framework Strategic PlanCollection PolicySecurity PolicyPreservation PolicyAccess PolicyThese are familiar things to us. We will focus on how each of the considerations that are specific to digital preservation.
23Strategic PlanOverview and scope of the digital preservation program and its contextMission/PurposeHigh level goals and objectivesCommitment to OAIS and community best practicesRelated documentation and who is responsibleAdministrative/Oversight structureHigh level audience statement
24Audience (Designated Community) OAIS requirementExplicitAll collectionsPer collectionAudience=assumed knowledge and resources
25Impacts of Audience Identification The kinds of collections you will acceptThe kind of descriptive information (metadata) you will provideThe kind of services you will offerSoftware, translatorsThe kind of preservation actions chosenSignificant propertiesThe access mechanisms you need to provide
26Collection PolicyWhat kinds of digital resources are you going to collect and digitally preserve?Content considerationsAre you focusing on a specific content area?Rights management considerationsMetadata responsibilities and requirementsRequirements for documenting acquisitions
27Collection Policy Technical considerations Digitization with no physical counterpartDigitization with a physical counterpartAnything born digitalBorn digital that can’t be reformatted to eye readable
28Collection PolicyAre there further limitations on what you will collect? (examples)Non-proprietary formats onlySpecific formats only (TIFF)Systems/databases onlyDistinct documents onlyMinimum amount of metadata required at time of acquisitionMaterials that can be digitally reformatted in a specific wayMove everything to TIFF?Move everything to XML?
29Documenting Acquisitions OAIS requires agreements with depositors that address acquisition, maintenance, access and withdrawalShould already be using these kinds of agreementsMay need to revise for digital materials, to includeWhat happens if functionality is lost?Is reformatting to eye readable an acceptable preservation option?What kind of access can you provide and is it acceptable?Are there digital-specific copyright issues to consider?
30Documenting Acquisitions May need to revise for digital materials, to includeMetadata creation responsibilitiesRights managementWhat level of functionality will be available from the digital repository?
31Security Policy System security Physical environment Backup and recoveryFixity of the data (reliability)Disaster preparedness and responsePlanning and documentation requirementsAssign responsibility
32Preservation Policy Commitment to digital preservation Goals of digital preservationScope of materialsFormatsMetadata suppliersAccess commitments
33Preservation Policy Definition of overall preservation strategy Are there limitations?What happens if preservation actions go wrong?Is reformatting to eye-readable an acceptable preservation action? Under what circumstances?Planning and documentation requirementsResponsibilities assigned
34Operational Responsibilities Based on work done by OAIS community to define the principle obligations of an OAIS compliant repositoryAppropriate planning documentation will be necessary to carry out operationsSpecific planning based on strategic plan and policies
35Operational Responsibilities AcquisitionPhysical and intellectual controlDetermines audience (designated community)Follows policies and procedures to assure preservation of authentic informationAccessPromotes development of best practices and standardsWon’t cover designated community again
36Acquisition Development of collection policies Includes specific required formats, if appropriateProcedures and workflows for copyright clearance for access and preservationMetadata specifications and implementationProcedures to ensure the authenticity of submitted materialAssessment of the completeness of the submissionDocumentation of all acquisition transactions
37Control Preparing the materials for storage Content analysis Significant propertiesVerification of metadataUnique and persistent identifier assignedAuthenticity and integrity checkMove to archival storage
38Preservation ActionsMonitoring of technology and the digital materialsTechnology watchPreservation planningClasses of materialActions to be takenDocumentation of actions and resultsFunctionality considerations
39Access A system for resource discovery Mechanism for authenticity checkAccess control mechanismsUser support
40Standards and Best Practices Promote and utilizeResults in economies of scaleCreation of high quality digital resources that are more amenable to preservationWork with software suppliers, potential depositors, designated communities
41In-house Significant investment Technical expertise Workflow impacts Maintain physical controlBuilding a digital repository inhouse will require a significant investment. You will need appropriate technical infrastructure and expertise. Even if you can use tools like Dspace or LOCKKS – there is still cost involved and expertise required.And, there will be more work to do – for the technologists and the information professionals.You will maintain physical control over your material.
42Outsource Can the service provider meet your needs and requirements? Less investment?No cost models to show if this is accurateLess reliance on in-house technical expertise and infrastructure necessaryWhat happens if the service provider goes out of business?There are some, not many, service providers who you can use to outsource digital preservation. Research them carefully – traditionally, our community has been comfortable outsourcing many things – but maintaining the archives isn’t one of them.We don’t know if it will cost more or less to outsource – no cost models exist. But, it will certainly depend on your circumstances and resources.
43Combination Build what you can Build what you need that can’t be outsourcedBuy what you can’t buildNow, digital repositories…Possibly the best alternative now and in the future will be to build what you can – for example, implement Dspace to support some of your digital preservation needs and use the resources you already have (staff, expertise, hardware, etc) – and buy what you can’t support on your own.Or, another way of looking at it is to outsource what you can; and build a digital repository for materials that can’t be supported using service providers.Whatever combination works for you!
44OAIS Metadata Implications Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects. Standards and best practices are developed to promote the creation of metadata to it supports interoperability and collaboration.Metadata setsMetadata encoding schema
45Types of Metadata Descriptive Technical Structural Administrative Preservationtypes of metadataDiscuss how each relates to preservationDescriptive – the information that describes the content of the digital resource. Dublin Core is a common descriptive metadata schemaTechnical – the technical information that facilitates sustainability of the object.Structural -- Documents the relationship between files and the content of the files that make up a complex digitized objectPreservation – information necessary to manage digital objects within a digital repository
46MetadataEach type of metadata will be needed to facilitate the preservation and usability of born digital materialUse standards and best practice metadata setsThink interoperabilityTechnologicallyElement setsWe will need each type of metadata to facilitate preservation and usability. Descriptive metadata will be used to help users identify what resources they need. Technical and structural metadata will facilitate preservation. And Preservation metadata will manage preservation activities.All types are vital to preservation
47Immediate Actions Get Your Team Together Identify your needs Do you really need a digital repository right NOW?Is there an interim solution until the field is more settled?Agree on vision and goalsPlanBack at the beginning we talked about stakeholders and planning. You can do these things now. Whether it is an “in-house” repository or a collaborative effort; you can begin getting the people in place, and begin planning.
48Immediate Actions Discuss strategy Communication Any institutional repository depends on a relationship with IT staffPrioritiesLanguage barriers
49Immediate ActionsIdentify the organizational infrastructure changes that need to be madeInvestigate existing tools and digital repositoriesLearn and experiment with existing toolsMake high level decisionsWhat kind of digital materials are we going to commit to preserving?Remember that selecting digital resources for preservation is, in my opinion, the most critical task. Make sure the material you are committing to preservation:Is worth the cost of digital preservationIs technically compatible with digital preservationI suspect that many of the early implementors of Dspace will end up with an enormous amount of digital data that was never selectively selected – and will end up being reformatted to eye readable media or deleted.This is one of the pitfalls that is occurring as “archiving” becomes more mainstream – as it seems like something that anyone can do. Appraisal and selection is being overlooked – because storage is cheap and people think that we should just “save everything”
50Immediate Actions Funding Inventories of digital resources Establish metadata standards and practicesIdentify and understand users
51Take Home Concepts Use standards and best practices The solution is complex; the tools are incompleteOrganizational and technological challengesLearn about what others are doing and build on itDon’t reinvent the wheelDigital preservation is complexThe time is right for solutions – there’s a lot of data out there, and there are going to be a lot of issues with preservation very soon. The market for these tools is coming – Ken Thibodeau agreed with this.Selectively select
52Take Home Concepts Resources are the issue People, not computers! Expect and plan for changeThis is all a work in progress“First generation” technologies, tools, understanding of issuesYou will redo workDigital preservation is complexThe time is right for solutions – there’s a lot of data out there, and there are going to be a lot of issues with preservation very soon. The market for these tools is coming – Ken Thibodeau agreed with this.Selectively select
53Existing ToolsSome existing digital repositories – this is not all of them!!! A very select few
54ToolsTechnical toolsInterfaces, infrastructure and technologies that allow you to do the work necessary to create, manage and preserve digital resourcesExamples might include:Metadata creationFile format verificationAlgorithms for fixity checksAppraisal/processing toolsAccess tools – indexing, finding aids, etc.Acquisition tools
55Tools Few currently exist Options Wait Build your own Modify existing toolsUse what there is
56ToolsDSpaceFedoraTMLOCKSSGreenstoneOCLC Digital Archive
57DSpace A specialized content management system that: manages and distributes digital itemsallows for creation, indexing and searching of metadatasupports long term preservation of materialdesigned to make submission and administration easy
58DSpace Developed by MIT and Hewlett Packard Based on freely available softwarecan use proprietary software as well with minor modificationsCustomizableAcademic community is especially active in the use of this implementationUNIX based; written in JavaHow is DSpace different from other digital repositories?DSpace is the first digital repository to address the myriad issues inherent in a multi-disciplinary archive, including:Differing policies, practices and cultures established by individual disciplinesThe variety of digital formats produced in today’s multi-media research environmentsThe complexity of metadata standards needed to accommodate and maintain access to the digital formats supported by the systemDSpace is designed with a flexible storage and retrieval architecture adaptable to a multitude of data formats and distinct research disciplines, known as “communities.” Each community has its own customized user portal that can use the community’s own practices and terminology.Is DSpace free?Yes. The DSpace system is freely available as open-source software (see locations below), under the terms of the BSD distribution license.We have also tried to find good open-source tools to package with the DSpace application, all freely available under an open-source license (although not all the same license as the one for DSpace itself), so that you get a complete system along with the part that we created.Can I change the DSpace system? Yes. The BSD distribution license for the DSpace system allows you to make any changes you like to the copy of the system at your organization. DSpace was designed to make adapting it for individual organizations as easy as possible. See our Development Guidelines for information on submitting code changes to DSpace.In fact, several modules in DSpace will probably need to be customized for a given organization (e.g., authorization and authentication), and some organizations may want to substitute the open-source tools supplied with DSpace with different ones (e.g., replace postgreSQL with mySQL or Oracle). At the moment, substituting a different RDBMS for postgreSQL will require a few changes to the system’s Browse module. There is documentation available explaining these interdependencies where they exist.
60DSpace No support available Preservation is done locally and is not inherent in the systemDownloads and specific information atDspace Demo - MIT Presshttps://hpds1.mit.edu/handle/1721.1/1776Note that there is no formal technical support – but there are listservs where implementors discuss issues and ask questions.Search for “technology” to show user interface. Note metadata, show complete metadata record. Cannot link to actual resource
61FedoraTMFlexible Extensible Digital Object and Repository Architecture“An Open-Source Digital Repository Management System” – the architectural underpinning or plumbingUsed to support institutional repositories, digital libraries, content management, digital asset management, scholarly publishing, and digital preservationDublin Core based metadataOAI compliantCan handle up to 1 million objects using open source softwareSupports batch uploading of groups of digital objectsFocus is on the storage and management of content; not on workflows and interfaces
62FedoraTM Cornell and University of Virginia, funded by Mellon Freely availableBased on open source software and web based technologiesLimited interfacesManagementAccessAccess LiteManagement API (API-M) – defines an interface for administering the repository. It includes operations necessary for clients to create and maintain digital objects and their components. API-M is implemented as a SOAP-enabled web service.Access API (API-A) – defines an interface for accessing digital objects stored in the repository. It includes operations necessary for clients to perform disseminations on objects in the repository and to discover information about an object using object reflection. API-A is implemented as a SOAP-enabled web service.Access-Lite API (API-A-Lite) – defines a streamlined version of the Fedora Access Service that is implemented as an HTTP-enabled web service.
64FedoraTM Installs on Windows PC Packaged to get up and running quickly Demo set of objectsScales with hardware in a production environmentNo support availablePlumbing only; no inherent preservationDownloads and information available at
65LOCKSS Lots of Copies Keeps Stuff Safe To safeguard web journals libraries subscribe toMimics the way libraries manage paper collectionsRedundant, distributed, decentralizedLibrary installs LOCKS on local computer with Internet connectivityLibrary tells computer which journal(s) to preserve and how often to look for new contentLOCKSS gets new content and stores a copy locally (“caching”)
66LOCKSSWorks only for HTTP/HTML standard file types (html, jpeg, gif, pdf, etc)Open source codeIt can be modifiedDesigned to be low cost, low timeWill run on a dedicated PCPC specs available on the LOCKSS siteDoes LOCKSS work for general Web content? No. LOCKSS requires that the content of URLs be mostly static, that the web site have a logical structure and that the HTTP headers permit caching. Does LOCKSS solve the Digital Library problem? No. In addition to the restrictions to static URLs and logical site structure it works only for HTTP/HTML and related formats such as GIF and JPEG. Thus the LOCKSS project avoids addressing most of the hard (and important) issues for digital libraries.Is LOCKSS open source? LOCKSS is an Open Source project. The source and binaries are freely available via We expect that because LOCKSS is open source software, there will be many different groups modifying and using the software. Each instantiation of the software will work best if it can communicate with all LOCKSS systems, so there will be pressure to keep the LOCKSS communication protocol interoperable. However, there will also be many reasons why various communities will want to modify LOCKSS for their own purposes. This diversity is valuable as it reduces the possibility that a single type of failure could affect all the caches simultaneously. See also, Advantages of open source software. Can a small PC running LOCKSS really serve a large reader community? Access to STM journals drops off very rapidly as the materials age. LOCKSS is intended to operate when the content is so old that it is no longer available from the publisher. There will therefore be very little reader access to the material, so a large powerful computer is not necessary.
68LOCKSS Publishers can prevent LOCKSS from caching their content Publishers must give libraries permissionLicensing language available on the LOCKSS web siteFreely availableNo support (ease of use is highlighted)Preservation is not inherentDo journal publishers permit copies to be cached? Yes, publishers do permit copies to be cached. Caching is a routine mechanism of the web. In particular, ISPs outside the US routinely cache content to control traffic and to minimize bandwidth constraints. Caching publisher’s material to insure access long term is different from loading content to serve the full text locally. The publisher’s we have consulted are very supportive of this software with the following caveats: they want readers to access the content from their site while their site is available or they want the system to give them anonymous reader behavior information. By and large however, the intended use for these caches are for those instances when content from the publisher’s server is unavailable. Many of the Stanford University Libraries HighWire Press affiliated publisher’s support this work. Do journals use HTTP headers to prevent caching? HTTP defines headers that can be used to prevent caching or restrict the lifetime of pages in caches. Highwire’s journals do not use these headers, so unlimited caching is permitted. LOCKSS is based on the Squid cache software and will obey any cache control headers it finds. Journal publishers who wish to prevent LOCKSS caching their content can use these headers to do so.PRODUCTION IN EARLY 2004 (not 1994)
69GreenstoneA suite of software for building and distributing digital library collectionsProduced by the New Zealand Digital Library Project at the University of WaikatoDeveloped and distributed in cooperation with UNESCO and the Human Info NGO.Open-source, multilingual software, issued under the terms of the GNU General Public License.
70Greenstone “Should in fact work on any Windows or Unix system.” “Local library”“Web library”Greenstone Librarian InterfaceThe “Organizer”“Local library”Built in web serverOnly available for Windows platforms“Web library”Requires external web serverGreenstone Librarian Interface (GLI) Igraphical tool for building new collectionsaltering or deleting existing collectionsexporting existing collections to stand-alone CD-ROMsimport or assign metadataan interactive collection design moduleThe “Organizer”Windows utility used for automatically generating some of the configuration files (metadata.xml, sub.txt etc.) used by complex Greenstone collections
74Greenstone Documentation is available Installer's Guide Developer's GuidePaper to Collection InsideGreenstone CollectionsMG/MG++Workshops are also heldListservs for implementorsSome technical support availableNot preservation oriented
75OCLC Digital Archive Standards based OAIS compliant METS encoded dissemination packagesPhased support for various formats and material typeCurrently text and still imageCan integrate with current library selection and cataloging activitiesContent owner manages the archived objects and determines accessKnown costsOffers bit preservation
76OCLC Digital Archive Functions Harvest from webpreview and reviewMetadata creationIngestFrom web or batchAccess managementpublic or restrictedViewingDisseminationReportsPeriodic Audits of Objects in the ArchiveFrequent Backups and Disaster PreventionThe primary focus of the Digital Archive (currently) is capturing web-based documents, storing them, and preserving them for the long term.The current customer base of the Digital Archive consists mainly of State Libraries and Government Agencies. For example, The State Library of Connecticut and GPO, the Government Printing Office.The Digital Archive supports Metadata Creation for each Digital Object and Content Management. Metadata is XML encoded.It Provides Discovery of archived objects in a variety of ways, including OCLC’s Connexion and FirstSearch services and local Catalogs.As you can see, the functions include Harvest (process of capturing document from web), Ingest, Content Group Management, Rights Group Management, Viewing, Dissemination, Reports and Periodic Audits and Backup/Disater Prevention Services.Our Dissemination Packages are based on METS.
77Digital Archive Web Services General steps:General steps in the process:1. Identify material to preserveCreate preservation metadata (using metadata mapped from bibliographic record in WorldCat)Harvest, or copy, the digital object (a single web document)Assign administrative properties to the objectIngest, or add, the object into the ArchiveAccess your now-archived object from the WorldCat record either on the web or via a library catalog.The Digital Archive Services are shown here. As mentioned before, most of these services are made available through OCLC’s Connexion interface.In addition to the Connexion interface, users with Admin privileges will access the Digital Archive Admin service directly. The Admin Service allows users to organize objects into Content Groups, create and assign Rights Groups to objects and create Rights Statements.System Processes include an Auditor which runs period checks on the integrity of objects in the archive. We are currently working on an Annotator which extracts technical metadata from images and stores that metadata.
78End User AccessOne an object has been ingested into the Digital Archive, there are a variety of ways for users to discover them.Since each Digital Object in the Archive also has an associated Bibliographic Record in WorldCat, any means of searching WorldCat can be used to discover Digital Archive content. OCLC’s FirstSearch and Local OPACs fall into this category.The bib record associated with a Digital Archive Object will contain a link to that object in the Digital Archive repository.Links to Digital Archive objects can also be embedded on web pages.
79OCLC Digital Archive Development Preservation policy and plans in progressExpanding formats and object types acceptedActive in development of preservation metadata standard and will complyActive in developing digital repository certificationAdditional information available at:If you would like more information about the OCLC Digital Archive,Please visit our web site atAlso, please feel free to Pam, Shweta and I with any questions or ideas.In conclusion, I want to say that the Digital Archive staff is committed to continue to develop the archive based on standards and evolve our dissemination process to keep up with those standards. Certainly, METS will continue to be a major part of that effort.Thank you very much for your time.
80Other Tools Australian PANDAS-PANDORA CONTENTdm (content management) SDSC Data Grid TechnologyWeb harvesting toolsE-records management softwareDocument management systemsData warehousing technologyXML parsing toolsSDSC and others