Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies.

Similar presentations


Presentation on theme: "Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies."— Presentation transcript:

1 Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies

2 Why do I need a long term strategy?

3 White: Blank

4 Defining an Archive – What it is

5 Defining an Archive – What it isn’t

6 What should I archive? What is driving your archive requirement? Regulatory requirements? Sarbanes Oxley / HIPAA / FDA CFR Business Continuity requirements? Cultural Preservation?

7 What should I archive? Document Content Do you have multiple content streams? Document Metadata Do you need it all, or will a subset suffice? What metadata is required to locate a document? Related Documents By association? By related search? How do you identify what is related? Version history Audit trail? How will you present audit in a self-contained manner?

8 How do I prepare it for the long term? What good is the data if you can’t view it? Say yes to: Well defined formats Open specifications Broad vendor support Stable, strong governing bodies (ISO, etc) Human readable text Say no to: Single vendor specific formats Closed specifications Binary only data Patent or IP encumbered formats

9 XML Data Verbosity is your friend Usually considered a weakness of XML, but for long term viability a verbose, descriptive format is desirable. Don’t forget the DTD / Schema External DTD/Schema references may disappear, when packaging for archive, grab a local copy. Prefer multi-vendor standards AniML (Analytical Markup Language) Many others (domain dependent) Avoid embedded binary data

10 PDF and PDF/A PDF is an excellent choice for long term archiving of documents (with a few caveats) PDF/A adds restrictions designed to make preservation for the long term more reliable No linked fonts, embedded only No audio / video content (must archive independently if required) Device independent color space definition No encryption Many more, depending on conformance level

11 PDF/A Demo

12 Images, Audio and Video Most image formats are open specifications and have broad support, choose the one that makes the most sense for your type of content. Use unencumbered video formats wherever possible, trading off size vs quality as required Use unencumbered, open audio formats where possible

13 Exporting Most archive mechanisms will require getting the files and related artifacts OUT of Alfresco ACP Bulk Export Download as Zip Custom Action ArchiveService

14 Packaging Not all archives will require packaging Storing of related artifacts together Document Metadata file Audit trail file Related documents Packaging type depends on archive target Simple Folder Zip file Cloud container

15 Now where do I put it?

16 Archive In Place Pros Simplest to implement Good support in Alfresco OOTB Easy to move to inexpensive storage (ContentStoreSelector) May be as simple as a marker aspect, property or path No export requried Cons Remains in Alfresco, contributing to DB size and potentially affecting performance Indexing in a separate store requires SOLR changes Backups of live repository grow with archive

17 Archive Repository Pros Active repository is leaner, faster Potentially no need to export to file Alfresco -> Alfresco transfers supported OOTB Cons Cost (separate repository, separate server) Complexity May need to develop a connector for the remote repository, if it is not Alfresco

18 Archive To Media Pros Repurpose existing backup equipment Media management and rotation schedules are well understood Easy to move very large volumes of data off site Cons Media degrades over time Requires preserving both software AND hardware Bulky Labor intensive Expensive Requires exporting to a file

19 Archive To Cloud Pros Cost Simplicity Transfer of risk to third party Cons Loss of control Long retrieval times for some services (< 4 hrs for AWS Glacier)

20 Glacier Direct

21 AWS Archive Demo

22 Glacier via S3 (Cloud Deployment)

23 Glacier via S3 (Hybrid Model)

24 References and Additional Reading US Library of Congress – Sustainability of Digital Formats http://www.digitalpreservation.gov/formats/ Blog of “The Long Now Foundation” http://blog.longnow.org/category/digital-dark-age/


Download ppt "Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies."

Similar presentations


Ads by Google