Presentation is loading. Please wait.

Presentation is loading. Please wait.

HARVARD E-JOURNAL ARCHIVING STUDY Dale Flecker June, 2002.

Similar presentations


Presentation on theme: "HARVARD E-JOURNAL ARCHIVING STUDY Dale Flecker June, 2002."— Presentation transcript:

1 HARVARD E-JOURNAL ARCHIVING STUDY Dale Flecker June, 2002

2 JOURNAL ARCHIVING IN THE PAPER ERA Large-scale redundancy Access copy and archival copy usually the same Not just storage, but preservation –includes environmental control, library binding, repair, reformatting... Deliberate, long-term archiving largely the role of national and research libraries

3 E-JOURNAL MODEL IS DIFFERENT “Copies” are remote, held in publisher systems –Not replicated across different institutions Perpetual license provides limited comfort in the absence of independent copies Long-term preservation involves very different issues than day-to-day access

4 E-JOURNAL ARCHIVING A GROWING PROBLEM Libraries bearing double costs –the e-journals users prefer –the paper for preservation Publishers cannot convert totally to digital –authors and editors distrust e-only journals because of concerns about persistence –libraries demand paper for preservation Libraries preserving paper version, but electronic more complete, increasingly the copy of record

5 MELLON E-JOURNAL ARCHIVING PROGRAM 13 institutions invited to submit proposals for a planning projects Two approaches –Large-scale distributed replication (LOCKSS) –Centralized archives serving a wider community

6 CENTRAL ARCHIVES PLANNING PROJECTS Publisher-based –Harvard (Wiley, Blackwell, University of Chicago Press) –Penn (Oxford and Cambridge University Presses) –Yale (Elsevier) Discipline-based –Cornell (agriculture), –NYPL (performing arts) Dynamic e-journals –MIT

7 FOUR BASIC ASSUMPTIONS Archive should be independent of publishers –responsibility of institutions for whom archiving is a core mission Archiving requires active publisher partnership Address long timeframes (100 years?) Archive design based on Open Archival Information System (OAIS) model

8 CENTRAL ARCHIVE MODEL Archive negotiates relationship with publisher Publisher deposits content regularly Content accompanied by metadata to support discovery and preservation Archived content only accessible under specific conditions Archive assumes responsibility for long- term preservation

9 SOME INTERESTING QUESTIONS What is archived? In what format? When is archive accessible? Who can access archived content? What does the archive “preserve”? Who does archiving? How is the archive paid for? How is the archive governed?

10 WHAT CONTENT IS ARCHIVED? E-journals not simply articles….

11 SOME COMMON STUFF Journal description Editorial board Instructions to authors Rights and usage terms Copyright statement Ordering information Reprint information Indexes Career information News Events lists Discussion fora Editorials Errata Reviewers Conference announcements

12 HARD AREAS Masthead, “front matter” stored as web pages, not in content management systems No control over the format of “associated materials” (datasets, images, tables, etc.) Advertising very complex –dynamic, frequently from third party, can involve country-specific complexities Links frequently separate from articles –regularly updated, sometimes dynamic

13 OUR INCLINATION Exclude little except advertisements –based on discussions with librarians and scholars –different from most “local loading” Articles include supplementary materials Include an “issue object” in addition to the article components – masthead, news, jobs, meetings, etc

14 Format for archived articles?

15 PDF? PDF almost universally available from publishers –and the only format available for some journals There are qualms... –proprietary –marked-up for display, not meaning –supports limited functionality –long-term “preservability” unclear –unlikely to remain the universal format over time

16 MARKED-UP TEXT? SGML/XML increasingly common –and likely to become more so Greater functionality, easier migration as technology changes Complex –DTDs vary widely from publisher to publisher –DTDs far from stable –archive documentation and rendering would be complex

17 “INTERCHANGE” ARTICLE DTD Intended for exchanging content between independent players Reduces complexity of interaction –archive needs to document, migrate, and display only one format archive can choose whether to maintain articles in interchange DTD, or transform at ingest for long- term storage –publisher needs deposit only one format for all archives

18 “INTERCHANGE” ARTICLE DTD Mellon, Harvard, National Library of Medicine, 2 consultants (Inera, Mulberry) working on draft standard DTD Design based on current publisher practice –must be easy for publishers to produce –homogenizes many elements –leaves options in some difficult areas –eliminates elements specific to individual publisher delivery systems

19 INTERCHANGE DTD ISSUES How low is the common denominator? What gets lost? –inevitably sacrifices some functionality and original appearance Transformation from publisher’s “native” DTD involves risks Some technically difficult areas –extended character sets, mathematical and chemical formulae, tables. “generated text”

20 SGML/XML QUALITY CONTROL PROBLEM SGML/XML is an output rather than the input for many publishers today –may not fully reflect the output (PDF, print) that users see day-to-day…how do you know it is good? If SGML/XML is transformed for deposit, errors can introduced Quality control of ingested content is expensive but critical for a sound archive

21 ARCHIVE MORE THAN ONE FORMAT? Publisher-based archive must accept PDF in any case (only format available for some titles) –so include both SGML and PDF when available? belt and suspenders Accept publisher’s original SGML also? –preserve information lost in conversion to interchange DTD –maintenance over time problematic

22 WHEN IS ARCHIVE ACCESSIBLE? Most publishers instinctively prefer “dark” archives –does not compete with publisher’s service If “dark”, what “trigger events” make it accessible? –after a given period of time (‘moving wall”)? –when content is not otherwise accessible (“failsafe”)? –only when content enters the public domain?

23 IS “DARK” DANGEROUS? If content is dark, how do you know it is still good? (real users are the best auditors)

24 WHO CAN ACCESS ARCHIVE CONTENT? Just other subscribing institutions? –does the archive need to maintain complex records of license rights? defining licensees a nightmare tracking license changes over time another nightmare Individual subscribers? –an even greater nightmare Everybody? –dramatically easier to administer

25 WHAT DOES THE ARCHIVE PRESERVE? Preservation is a format-by-format issue –and most e-journals are composed of many formats How much “look and feel” preserved? Just preserve the “core intellectual content”? Does archive insure content remains “render-able” as technology changes?

26 HARVARD’S DIGITAL REPOSITORY Repository specifies preferred (“normative”) formats, which will be kept useable Just maintain bits for others –for e-journals this is likely for many “associated materials” (datasets, models, etc.) generally accepted in ANY format maintaining the viability of such wildly heterogeneous materials unrealistic –keep unaltered for future “digital archeology”

27 WHO DOES ARCHIVING? “Common good” activity –model based on a few archives serving many subscribers Is this an appropriate role for individual universities? –research libraries have technical capability, relationships with publishers and subscribers –BUT how archiving would be paid for is central…...

28 HOW IS THE ARCHIVE PAID FOR? First question: who benefits? –publishers, libraries, authors, scholarly societies… –is there a way to share costs? Cost categories include –preparation of “archivable” objects –ingestion and quality control –long-term storage –preservation

29 PROPOSED MODEL Publisher assumes cost of preparing objects in standard format (whenever possible) Deposited material accompanied by two part fee from publisher –ingest fee to cover up-front costs varies with publisher effort to create easily archived objects??? –“dowry” to create maintenance endowment Real funding sources include subscribers, authors, societies

30 HOW IS THE ARCHIVE GOVERNED? * Publishers hand their its intellectual property to independent party -- do they have a continuing say? * Are there other stakeholders who should also have a say?

31 HARVARD’S MODEL ARCHIVE Accept content for all titles a publisher produces –archive as many journal elements as possible Maintain an archive serving the entire community Store and maintain more robust formats (e. g., XML) when possible Collect metadata to support administration and preservation

32 HARVARD’S MODEL ARCHIVE Requires only a few archival copies of any given journal Archive assumes responsibility for preservation migration when canonical versions deposited Organizational and economic model difficult

33 NEXT? Over to Kevin….


Download ppt "HARVARD E-JOURNAL ARCHIVING STUDY Dale Flecker June, 2002."

Similar presentations


Ads by Google