Data Citation in The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the Board on Research Data and Information “Developing Data Attribution and Citation Practices and Standards An International Symposium and Workshop” August 22-23, 2011
Collaborators* Data Citation in The Dataverse Network ® Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy Research Support Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS , SES ), IMLS (LG ), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. * And co-conspirators
Related Work Data Citation in The Dataverse Network ® M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). M. Altman,2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007), Springer Verlag. M. Altman and G. King “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). G. King, 2007, " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research, Vol. 32, No. 2, pp
Data Citation in The Dataverse Network ® Some Terminology
Data Citation in The Dataverse Network ® “dataverse” = a virtual archive “Dataverse Network” = a server “Study” = a work An Open-Source Application for Publishing, Citing and Discovering Research Data
Data Citation in The Dataverse Network ® Examples
Josh Angrist’s Dataverse Data Citation in The Dataverse Network ®
Two-for-one “Data” Citation = Study Citation Sorta-Kinda-Meta
Data Citation in The Dataverse Network ® Joshua D. Angrist; Eric Bettinger; Erik Bloom; Elizabeth King; Michael Kremer 2008 "Replication data for: Vouchers for Private Schooling in Colombia: Evidence from a Randomized Natural Experiment” UNF:3:4v7GYq3uSEeCpk8M567ITw== Murray Research Archive [Distributor] V1 [Version] Author Date Title Persistent ID Required Recommended UNF DDI 2 Extensions
What’s a UNF? Data Citation in The Dataverse Network ® UNF = “Universal Numeric Fingerprint”=~ Semantic Fingerprint
Variations Data Citation in The Dataverse Network ® Dataset specific – Same Id, part specified, UNF is for part state,year,data_access_who UNF:5:X4QdWp04aCZntvxZKSHLzQ== Citation for subset of Variables/columns/measures (NOT observations!) Proxy Handle
Attribution Cite data as first class work Identify contributors to data Discovery Locate data via identifier Locate data integral to article Locate works related to data – articles, derivatives, sources Persistence Evidence persists as long as assertions based on evidence? Durability of data transparent? Access Access to surrogate On-line access to object Machine understandability Long-term human understandability Provenance Associate work with version of evidence used Verify fixity of information Data Citation in The Dataverse Network ® Use Cases
Contact Us Data Citation in The Dataverse Network ® Micah Altman maltman.hmdc.harvard.edu The Dataverse Network ® thedata.org