PDF/A The Development of a Digital Preservation Standard Stephen Abrams Harvard University Betsy Fanning Association for Information and Image Management.

Slides:



Advertisements
Similar presentations
BPEL4WS Business Process Execution Language for Web Services Jim Clark eBusiness Strategist
Advertisements

1 Copyright ©2007 Sandpiper Software, Inc. Vocabulary, Ontology & Specification Management at OMG Elisa Kendall Sandpiper Software
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
doi> Digital Object Identifier: overview
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
Creating Section 508 Compliant Documents & Presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Music Encoding Initiative (MEI) DTD and the OCVE
Electronic Theses and Dissertations: Benefits, Issues, and the University of Waterloo Approach
Iframes & Images Using HTML.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Developing a Records & Information Retention & Disposition Program:
3. Technical and administrative metadata standards Metadata Standards and Applications.
WMES3103 : INFORMATION RETRIEVAL
Portable Document Format PDF. What is PDF? Universal file format developed by Adobe Systems Incorporates fine detail and quality of print publications.
Creating Accessible PDF Documents Debi Orton, NYS GOER Jeff Bennett, NYS CIO/OFT.
Components Text Text--Processing Software A Word Processor is a software application that provides the user with the tools to create and edit text.
Creating and publishing accessible course materials Practical advise you can replicate.
Portable Document Format PDF. What is PDF? Universal file format developed by Adobe Systems Incorporates fine detail and quality of print publications.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
2005 Adobe Systems Incorporated. All Rights Reserved. 1 Ontolog Forum Gunar Penikis Sr. Product Manager Adobe Systems.
3. Technical and administrative metadata standards Metadata Standards and Applications Workshop.
Legal aspects of digital archiving. Agenda Objectives of the conference Archiving vs Legal Archiving vs Compliance Legal archiving coverage and principles.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Publish Your Work BIM Curriculum 04. Topics  External Collaboration  Sharing the BIM model  Sharing Documents  Sharing the 3D model  Reviewing 
T OWARD A C OLLABORATIVE A PPROACH TO S TAKEHOLDERS ’ I NVOLVEMENT IN ETD S C URATION Presenters: Daniel Gelaw Alemneh, Geneva Henry, & Shannon Stark L.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
IECM Briefing: XML Community of Practice Betsy Fanning AIIM.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Document Formats How to Build a Digital Library Ian H. Witten and David Bainbridge.
The Role of File Formats in Digital Preservation: Opportunities and Threats ErpaTraining on File Formats for Preservation Vienna, May 10-11, 2004 Frank.
© 2005 Global Grid Forum The information contained herein is subject to change without notice Leading the pervasive adoption of grid computing for research.
What is the Archives of Institutional Memory? The Archives of Institutional Memory is a digital repository for disseminating and preserving official Indiana.
JRN 440 Adv. Online Journalism Raster file formats for print Wednesday, 2/8/12.
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
Preserving PDF Documents: Exploring the PDF/A Standard Susan J. Sullivan, CRM U.S. National Archives and Records Administration
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
12 Developing a Web Site Section 12.1 Discuss the functions of a Web site Compare and contrast style sheets Apply cascading style sheets (CSS) to a Web.
PDF and Long Term Preservation May 17, 2005 Susan J. Sullivan, CRM
AIIM Standards Betsy Fanning Director, Standards & Content, AIIM.
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
Introduction to metadata
Lecture 3 Creating a Web Page with HTML. Objectives §Hypertext Document in WWW p §The HTML language p l Definition l Web browsers and.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
1 Electronic Document Workflow Stephen P. Levenson Convener TC 171 SC2/JWG5 Chief Technology Office Administrative Office of the U.S. Courts.
Stewart Rogers Marketing Communications Director Slide 1 © 2010 PDF/A Competence Center, PDF/A for long-term archival What you.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
2.01 Understand Digital Raster Graphics
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
2.01 Understand Digital Raster Graphics
2.01 Understand Digital Raster Graphics
ISO Standard Development
Portable Document Format
2.01 Understand Digital Raster Graphics
Metadata for research outputs management
Portable Document Format PDF
iECM Briefing: XML Community of Practice
2.01 Understand Digital Raster Graphics
Presentation transcript:

PDF/A The Development of a Digital Preservation Standard Stephen Abrams Harvard University Betsy Fanning Association for Information and Image Management Diana Helander Adobe Systems, Inc. Susan Sullivan, CRM National Archives and Records Administration SAA 69th Annual Meeting, New Orleans, August 14-21, 2005

Agenda The preservation problem The ISO standards process Benefits of PDF/A Technical overview Questions

The preservation problem What is the best option for preserving electronic documents over archival time spans? –TIFF? Widely adopted No access to underlying text without OCR No mechanism for capturing logical structure Difficult to create “born-digital” documents –XML? Good for describing logical structure, but not appearance Many incompatible domain-specific schemas –Native Format (e.g., MS Word)? Several ubiquitous, but closed proprietary formats –PDF?

The preservation problem –PDF is a ubiquitous open format for electronic documents Proprietary, but with publicly available specification –Many statutory, regulatory, and institutional policies mandate the retention of PDF-based documents over multiple generations of technology –The feature-rich nature of PDF can complicate preservation efforts

Desirable properties of a preservation format PDF/A objectives –Device independence Can be reliably and consistently rendered without regard to the hardware/software platform –Self-contained Contains all resources necessary for rendering –Self-documenting Contains its own description –Transparency Amenable to direct analysis with basic tools

Desirable properties of a preservation format PDF/A objectives –(Lack of) technical protection mechanisms No encryption, passwords, etc. –Disclosure Authoritative specification publicly available –Adoption Widespread use may be the best deterrent against preservation risk

PDF/A usage PDF/A standard may be used by vendors to: –Develop applications that read and write and otherwise process PDF/A files These applications will be used by organizations to: –Create and process PDF/A conformant files As part of their business processes In conjunction with necessary adjunct archival and records management policies and procedures

Current support for PDF/A There is no “formal” support for PDF/A today –Acrobat 7 support for “draft” version Nor has PDF/A yet been adopted as a “required” format by any governmental, academic, or commercial body However, once ISO is formally published, we can expect tools to be developed quickly –Acartus ApertureONE ERM –Many other vendors participated in the standards process Appligent, Callas, Global Graphics, PDF Sages And we expect that the mandated (or recommended) use of PDF/A will follow

PDF/A caveats However… –PDF/A alone does not guarantee preservation –PDF/A alone does not guarantee exact replication of source material –The intent of PDF/A is not to claim that PDF-based solutions are the best way to preserve electronic documents –But once you have decided to use a PDF-based approach, PDF/A defines an archival profile of PDF that is more amenable to long-term preservation

The PDF/A standard “This International Standard specifies how to use the Portable Document Format (PDF) 1.4 for long-term preservation of electronic documents” –Applicable to documents containing character, raster, and vector data –The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies

The PDF/A standard PDF/A is a file format standard PDF/A is just one component of a comprehensive preservation strategy –Successful implementation depends upon: Records management policies and procedures Additional requirements and conditions Quality assurance processes

Agenda The preservation problem The ISO standards process Benefits of PDF/A Technical overview Questions

The PDF/A standard Multi-part ISO International Standard –ISO :2005, Document management – Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1) –Part 2 ( ) intended to bring PDF/A into conformance with PDF 1.6 –And additional future parts, as necessary

Time Line for Part 1 October 2002Initial meeting of AIIM/NPES PDF/A committee April 2003Initial Working Draft (WD) August 2003New Work Item (NWI) approved and Joint Working Group (JWG) formed December 2003First Committee Draft (CD) approved September 2004Second CD approved June 2005Draft International Standard (DIS) unanimously approved

Time Line for Part 1 Submitted to ISO Central Secretariat for publication as International Standard –Should be publicly available September 2005 Throughout the process, PDF/A has been reviewed by technical experts from 15 national standards bodies

ISO/TC 171/SC 2/WG 5 ISO Joint Working Group (JWG) for PDF/A –ISO/TC 171/SC 2, Document management applications – Application issues –ISO/TC 130, Graphic technology –ISO/TC 46/SC 11, Information and documentation – Archives/records management –ISO/TC 42, Photography

Role of AIIM and NPES AIIM, Association for Information and Image Management –Secretariat to ISO/TC 171 and ISO/TC 171/SC2 –Secretariat to US Technical Advisory Group (TAG) for ISO/TC 171 NPES, The Association for Suppliers of Printing, Publishing, and Converting Technologies –Secretariat to ANSI Committee for Graphic Arts Technologies Standards (CGATS) –Secretariat to US TAG for ISO/TC 130 Joint sponsors of the initial US PDF/A committee

PDF/A terminology PDF/A-1 refers to the format defined by Part 1 (ISO ) of the standard Part 2 (ISO ) will define PDF/A-2 New Parts can be added to the PDF/A family of standards without obsoleting previous Parts

Agenda The preservation problem The ISO standards process Benefits of PDF/A Technical overview Questions

PDF/A Non-proprietary standard –Based on a proprietary, but open format Developed by inclusive set of stakeholders Subject to rigorous technical review Minimal restrictions necessary to facilitate long- term preservation Not reliant on the existence of any particular reader

Relationship to other standards PDF/X for pre-press data exchange –ISO parts 4 (PDF/X-1a), 5 (PDF/X-2), and 6 (PDF/X-3) –Currently based on PDF 1.4; work underway to extend to PDF 1.6 –It is possible for a file to be both PDF/A and PDF/X compliant PDF/E for engineering, architectural, and GIS documents –Provisionally based on PDF 1.6 PDF/UA for accessibility –Intended to address Section 508 concerns

Intellectual property rights PDF/A is a file format standard Anyone can use the PDF Reference and XMP Specification in conjunction with ISO to create applications that read, write, or process PDF/A files Adobe has granted a general royalty free license to use certain of its patents to create applications that read, write, or process PDF/A files

Supplemental information Informative annexes to ISO –PDF/A-1 conformance summary –Best practices Guidelines for capturing or converting electronic documents to PDF/A –For documents created according to specific institutional rules –Replicates the exact quality and content of source documents within the PDF/A file Required for compliance with NARA’s PDF Transfer Guidance PDF/A FAQ –Under development –Will be available on AIIM and NPES web sites

Supplemental information Application notes –Will provide specific guidance on the use of PDF/A Similar in intent to those produced for PDF/X –Under development –Will be available on AIIM and NPES web sites AIIM and NPES will archive copies of, and maintain public access to, the PDF Reference and XMP Specification –As well as other freely available, non-ISO normative references of ISO

Agenda The preservation problem The ISO standards process Benefits of PDF/A Technical overview Questions

PDF/A PDF/A is intended to address three primary issues: –Define a file format that preserves the static visual appearance of electronic documents over time –Provide a framework for recording metadata about electronic documents –Provide a framework for defining the logical structure and semantic properties of electronic documents

Nevertheless… PDF/A may not be the last preservation format you will need However, proper application of PDF/A should result in reliable, predictable, and unambiguous access to the full information content of electronic documents

PDF/A conformance Two conformance levels –PDF/A-1a Compliance with all requirements of Including those regarding structural and semantic tagging –PDF/A-1b Compliance with all requirements of minimally necessary to preserve the visual appearance of a PDF/A file

PDF/A requirements Conformance to PDF 1.4

PDF/A requirements Conformance to PDF 1.4 With features that are –Required

PDF/A requirements Conformance to PDF 1.4 With features that are –Required –Recommended

PDF/A requirements Conformance to PDF 1.4 With features that are –Required –Recommended –Restricted

PDF/A requirements Conformance to PDF 1.4 With features that are –Required –Recommended –Restricted –Prohibited

PDF/A requirements Conformance to PDF 1.4 With features that are –Required –Recommended –Restricted –Prohibited Reader functional requirements

PDF/A requirements Conformance to PDF 1.4 With features that are –Required –Recommended –Restricted –Prohibited Reader functional requirements Features not documented in 1.4 are ignored by PDF/A readers

General Required –Conformance to 1.4 requirements Recommended –Linearization hints should be ignored Restricted –Document information dictionary must be consistent with XMP metadata Prohibited –Encryption –LZW compression –Embedded files –Optional content –Sound and movie media types

Graphics Required –Device independent color –Embedded color spaces Restricted –Image dictionaries –Separation and DeviceN color spaces –Form XObjects –Extended graphics state –Rendering intents Prohibited –Reference XObjects –PostScript XObjects –Non-PDF 1.4 defined operators –Transparency

Fonts Required –Fonts legally embeddable for unlimited, universal rendering –Embedded font programs –Embedded CMaps –Consistent font metrics –Unicode character map (For Level A conformance only) Recommended –Font subsets Restricted –Character encodings

Annotations Required –Reader mechanism to expose the annotation dictionary Contents key Restricted –Annotation dictionaries Prohibited –Non-PDF 1.4 defined types –FileAttachment, Sound, and Movie types

Actions Required –Behavior for NextPage, PrevPage, FirstPage, and LastPage actions as defined in PDF 1.4 –Reader mechanism to expose GoToR dictionary F and D keys, URI action dictionary URI key, and SubmitForm action dictionary F key Prohibited –Launch, Sound, Movie, ResetForm, ImportData, and JavaScript actions –Deprecated set-state and no-op actions –Named actions other than the 4 page navigation actions –Widget annotation or Field dictionary AA key

Metadata Requires use of Extensible Metadata Platform (XMP) –Proprietary, but open format –Used for metadata creation, processing, and interchange –Based on Resource Description Framework (RDF) Open World Wide Web Consortium (W3C) standard Cornerstone of Semantic Web –Pre-defined schemas Base, DC, DRM, DAM, Workflow, EXIF, PDF, PSD –Defined extension mechanism –Embedding rules TIFF, JPEG, JPEG 2000, HTML, AI, PSD, PDF, …

Metadata Required –Document level XMP metadata –Equivalent XMP metadata for all appropriate Document Information Dictionary properties –Embedded extension schema –Version and conformance self-identification Recommended –File identifier –File provenance –Font metadata Prohibited –XMP packet header bytes and encoding attributes

Logical structure (Level A conformance only) Required –Tagged PDF –Explicit word breaks Recommended –Tagging for pagination, layout, and page artifacts –“Strongly structured” block-level structural tagging –Natural language tagging –Alternative description, non-textual annotation, replacement text, and abbreviation/acronym expansion tagging

Interactive forms Required –Field appearance dictionary Restricted –NeedAppearance flag –Explicit word breaks Prohibited –A and AA keys in Widget and Field dictionaries Note –There is no restriction on the use of digital signatures, as defined by PDF 1.4

What’s under consideration for Part 2? Based on PDF 1.6 The following specific features are under consideration for inclusion in Part 2 –JPEG 2000 image compression –More sophisticated digital signature support –OpenType fonts –3D graphics –Audio/video content –Consistency with PDF/X, PDF/E, PDF/UA

What’s under consideration for Part 2? If PDF/A-1 does not meet your specific needs, get involved in the process –Contact Betsy Fanning, Director, AIIM Standards Program

PDF/A summary ISO (should be available September 2005) File format standard One component of a comprehensive archival strategy Based on PDF 1.4 Two conformance levels –Level A for structural/semantic tagging –Level B for appearance only Emphasis on reliable and predictable rendering of static visual appearance –Do’s: embed fonts, device-independent color, XMP metadata, tagging –Don’ts: encryption, LZW, embedded files, external content references, transparency, multi-media, JavaScript

PDF/A summary Consistency with PDF/X Work planned for Part 2

Agenda The preservation problem The ISO standards process Benefits of PDF/A Technical overview Questions

Questions?