Presentation is loading. Please wait.

Presentation is loading. Please wait.

Author(s): Brian Lavoie, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.

Similar presentations


Presentation on theme: "Author(s): Brian Lavoie, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial."— Presentation transcript:

1 Author(s): Brian Lavoie, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial 3.0 License: http://creativecommons.org/licenses/by-nc/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use.

2 Brian Lavoie Research Scientist OCLC lavoie@oclc.org Keeping Research Data Safe: Framing the Costs and Benefits of Preserving Research Data University of Michigan February 12, 2010

3 Roadmap The Fourth Paradigm “Keeping Research Data Safe” Framing the Costs Framing the Benefits Concluding Thoughts

4 Research Paradigms (Jim Gray & Tony Hey) 1,000 Years Ago: Experimental Science Description of phenomena Past Few Hundred Years: Theoretical Science Organizing description into theory Past Few Decades: Computational Science Simulation of complex systems Today: Data-Intensive Science Discovery through analysis of massive data sets

5 Managing the Record of Science Traditionally: “record of science” embodied in journal/monographic literature (secondary resources) Primary resources (e.g., data sets) not always retained Dewald, Thursby, Anderson (1986) “Replication in Empirical Economics: The Journal of Money, Credit, and Banking Project” American Economic Review In a Fourth Paradigm world, ongoing availability of research data increasingly important Need to manage “testimony” AND manage “evidence” Need capacity to preserve large volumes of digital data What does it cost?

6 “Keeping Research Data Safe” Project sponsored and funded by UK Joint Information Systems Committee (JISC) Neil Beagrie (Charles Beagrie Consulting) and colleagues UK perspective, but wider applicability

7 Aims Part 1 (completed): Investigate costs of long-term preservation of research data Develop Cost Framework to assist UK HE institutions in strategic planning and cost analysis Report published April 2008 Part 2 (almost completed): Investigate availability of quality time-series cost data What cost data do institutions collect? Analyze data in context of “operationalizing” Cost Framework Can data be organized/understood within Cost Framework? Investigate expression of benefits re research data preservation

8 Cost Framework Investigate key cost elements of preserving digital research data Summarize findings in Cost Framework for planning and analysis Methodology: Synthesize existing models: LIFE digital lifecycle model; NASA Cost Estimation Toolkit; OAIS; Transparent Approach to Costing (TRAC) Literature review 12 interviews; 4 site visits Cost Framework: Activity Model List of key cost variables Resource Template (costing for UK HE)

9 Activity Model Enumeration of full range of activities required to support long-term preservation of research data How are costs allocated across these activities? Three major categories: Pre-Archive Phase Archive Phase Support Services Activities related to the creation of research data for later transfer to the archive Activities which occur during period of archival retention Administrative and non-preservation technical services (e.g., obtained thru campus computing

10 Multiple levels of granularity Archive Acquisition Selection Negotiate submission agreement Outreach and support Ingest Receive submission Quality assurance Generate information package for Archive Generate administrative metadata Generate descriptive metadata and user documentation Coordinate updates Reference linking … Costs can be allocated at any level Phase Activity Sub-activity

11 Cost variables Key variables that shape cost of preserving research data Service Adjustments: “adjustable” aspects of the preservation process that impact costs i.e., choices; preservation goals Examples: number of acceptable file formats; volume and frequency of deposits; richness of metadata description … Economic Adjustments: spreading costs over time Rate of inflation/deflation: recurring costs subject to changes in prices Rate of depreciation: upfront expenditures for resources that are consumed gradually over time

12 Resource Template Need to align Cost Framework with existing costing systems in UK HE Transparent Approach to Costing (TRAC) Model Endorsed by UK HE, government, research funders Express Full Economic Cost: “the total costs to an institution of undertaking a project or activity in a sustainable manner” Cost categories (resources): Staff, Equipment, Travel, Consumables, Estate costs, Indirect costs, Outsourcing Resource Template: organizes TRAC cost categories according to Activity Model, in a form closely aligned to TRAC methodology.

13 Putting it all together Activity Model Cost Variables Resource Template Identifies cost allocations across preservation process Service adjustments: adjust costs to specific requirements Economic adjustments: spread costs over time Pulls all of it together into TRAC-friendly costing model

14 Cost considerations Stakeholder expectations/preservation aims Reflected in service adjustments Example: High use/large number of users: likely require more investment in metadata/documentation, user support, etc. Managing future costs Decisions made now can affect future costs Example: number of permissible formats Timing Level of cost can depend on when an activity is performed in the digital lifecycle Example: metadata creation

15 More cost considerations Dependencies Adjusting one aspect of service can impact other costs Example: Change volume ingested: impacts metadata creation activities “Sticky” resource allocations Resources not always easily adjustable Example: staffing; capital equipment Evolutions in technology and practice Better and cheaper ways of doing things Example: innovations; off-the-shelf solutions; best practice

16 Interviews and site visits Gather evidence; validate methodology; test Cost Framework Illustrate the variety of costs associated with long-term preservation of research data in real-world settings 12 interviews; 4 site visits to research data repositories: Archaeology Data Service (York University) University of Cambridge King’s College London University of Southampton Conclusion: valuable analytical tool; would benefit from additional testing, refinement (KRDS2)

17 Findings Cost profile: skewed toward near-term costs Example: UK Data Archive (Archive phase) Acquisition and Ingest:42 percent Archival storage and preservation: 23 percent Access:35 percent Influence of timing on costs Example: Costs of metadata creation (Digitale Bewaring Project) Costs ~333 euros to create batch of 1000 metadata records. After 10 years have passed since creation, may cost ~10,000 euros to “repair” batch of 1000 records with bad metadata

18 More findings … Economies of scale Example: University of London Computing Centre 600% increase in accessions will only increase costs by 325% First-mover Innovation costs Most costs cover “traditional” functions of storage, data management, preservation planning But also significant expenditures on managing/refining new technologies/techniques, other forms of R&D Included in Activity Model as Activity under Archive Phase

19 KRDS2 Motivated by three Recommendations included in KRDS1 1. Value of Cost Framework depends on availability of quality time-series cost data What does “quality” mean? How much is available? 2. “Operationalizing” cost components and variables defined in Cost Framework Critical for facilitating implementation of Cost Framework; collection of cost data needs to be feasible and scalable 3. Need ways to express benefits of preserving research data Support preservation decision-making

20 KRDS2 activities Availability of cost data: Selection criteria for identifying quality cost data suitable for analysis: e.g., actual data, not estimates; time series, not snapshots; reasonable sample to illustrate cost variability; Survey of repository practice in collecting cost data Operationalizing “cost concepts”: In-depth cost studies (Oxford, Archaeological Data Service, University of London Computing Centre, UK Data Archive) How does real-world data align with Cost Framework? Conclusion: Activity Model good fit to real-world activities; some refinements needed

21 Framing the benefits: Three Dimensions DirectIndirect Near-termLong-term PublicPrivate

22 Benefits in practice Direct (UKDA): availability of preserved research data allows for verification of past research and input for future research How to measure? Demonstrable use: e.g., citations in literature Indirect (UKDA): General Household Survey 2001 Can be replaced at cost of ~£500,000; cannot be replicated Near-term (Oxford): preservation of research data confers benefits on current researchers: Data loss from “post-doc turnover”; help current researchers maintain their data; data submitted along with scholarly papers Long-term (UK National Crystallography Service): preservation of embargoed data Invest now for access later

23 Benefits in practice Private (Institutional repository): collect and manage digital outputs of affiliated faculty and students e-prints, data sets, learning objects, etc. Public (Institutional repository): making digital outputs available to the wider research and learning community Transfer of knowledge across time and space for the benefit of all Decision-makers should consider the full range of benefits across a variety of dimensions

24 Closing thoughts … Convergence on standard (yet flexible) cost framework helpful for planning and analysis (both internally & cross-project) “Off-the-shelf” cost framework eases “economic implementation” “What does it cost?” = “It depends” Evidenced by service adjustments Choices shape preservation strategy  determines overall cost Cost Framework not just for internal budgeting purposes Outsourcing: need to map requirements to costs More work needed on identifying and expressing benefits Decision-making requires assessments of costs AND benefits

25 Further information … Keeping Research Data Safe: A Cost Model and Guidance For UK Universities (2008) N. Beagrie, J. Chruszcz, B. Lavoie. http://www.jisc.ac.uk/media/documents/publications/keepingrese archdatasafe0408.pdf Keeping Research Data Safe 2 (2010) N. Beagrie, B. Lavoie, M. Woollard. [Forthcoming in March]


Download ppt "Author(s): Brian Lavoie, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial."

Similar presentations


Ads by Google