Primarily digitized versions of analogue rare/unique/archival/valuable materials Close collaborations with Rare Books & Special Collections, and McGill University Archives Currently working on collection prioritization and scaling up production Digitized content
Institutional mandate Determined by records retention schedule Involves transfer of selected records from originating departments after the end of their immediate life Archival appraisal practices determine what to keep Born digital university records of long term value
Fonds of personal papers and organizational records The same types of documents we’ve already collected in paper form Examples of paper archival fonds from the past: Montréal Natural History Society James McGill MS 435 George Mercer Dawson (President of Royal Society of Canada) Harvey Cushing Fonds (William Osler biographer) Born digital archival materials
E.g., digital art and digital humanities projects Often more like software than a set of standalone files High risk of loss compared to analogue ancestors Born digital creative content
Typically we only have remote access, are not responsible directly for curation In some cases we must deliver ourselves rather than rely on the vendor And in these cases, we take on curation responsibility Some licensed/purchased digital content
McGill students required to deposit masters and doctoral theses, sign a non-exclusive license to disseminate Policy allows students to request a 1-year embargo Students retain copyright McGill does not contract with ProQuest for ETD delivery and preservation McGill participates in Theses CanadaTheses Canada Some courses show interest in pushing student work to eScholarship@McGill ETDs and other student work
Supporting “green OA” In fulfillment of funder mandates Or voluntarily Still not heavily used, at McGill or elsewhere No serious discussion yet at McGill about a campus mandate Expecting Canadian Tri-Council OA mandate beginning May 1, 2015 Pre-prints/post-prints
BIG new focus Studies show significant loss of data sets over time Odds of data supporting a paper being extant fall by 17% per year (Vines et al 2014; doi:10.1016/j.cub.2013.11.014) Some studies show a citation advantage for papers with open data 30% for papers published in 2004 and 2005 (Piwowar and Vision, 2013; doi:10.7717/peerj.175) Expecting Canadian Tri-Council data management planning requirements in 2015/2016 Research data
As difficult as any other step Luckily, it’s not an all or nothing proposition Some areas we’re pretty good at (ETDs, digitized collections) Others we try but with limited success (pre-prints/post-prints) Others are brand new to us (research data, born digital archival materials, born digital creative content) This is difficult!
Determine what’s worth keeping Create/map metadata Responsibility to handle personally identifiable information carefully Processing and organizing
Digitization master files to NCS for storage Backups of files/servers (digital collections, eScholarship@McGill, born digital university records) Multiple copies including one off site eScholarship is a “repository” but not a “preservation repository” Reliance on external vendors (licensed content) E.g., through LOCKSS We run a LOCKSS node at McGill And the stuff we’re not handling so well (born digital special collections/archival materials) Several different approaches in place now
Need better repositories That handle common use cases Hierarchical file structures Paged objects Display common file types in-browser That are connected to preservation systems and manage content in them How do we make this better?
Find/Collect It Put It Somewhere Safe Keep It Safe Over Time
Harder than the paper world! What is “the long term”? How long will Universities exist in their current form? How long will computers continue to function the way they do now? How will metadata structures evolve over this period of time? What does a “pay once” model for digital preservation look like? What criteria do we use to determine the useful lifespan of a digital file? It’s about policy as much as technology How do our organizations set things up to ensure someone takes an active management role over time Yeah, this is hard
Standardize input file formats to the degree possible Actively check file integrity Refresh hardware frequently Know what will need to be emulated, and what you can safely migrate Partner! Strategies
Chronopolis @ UC San Diego Chronopolis CLOCKSS CLOCKSS Portico Portico Héritage (Canadiana from CRKN) Héritage Scholars’ Portal Scholars’ Portal HathiTrust HathiTrust APTrust APTrust DPN DPN Who’s doing this well?
And they all need funding to run And our organizations pay the membership fees from our institutional budgets How do we get them all to work together? Committee on Coherence at Scale for Higher Education Committee on Coherence at Scale for Higher Education That’s a lot of groups!