Presentation on theme: "Data Sharing Practices: Implications for Curation and Re-use Carole L. Palmer Center for Informatics Research in Science & Scholarship Graduate School."— Presentation transcript:
Data Sharing Practices: Implications for Curation and Re-use Carole L. Palmer Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Sharing Data: Practices, Barriers, and Incentives ASIST annual meeting 10 October 2011
Data Practices research group - CIRSS Team members: Tiffany Chao Melissa Cragin Nic Weber Karen Baker Andrea Thomer big science data long tail of “dark” small science data small science complex, heterogeneous data implications for data curation value for re-use across disciplines
Data Curation Profiles Project Scientists ’ data workflows & curation requirements across disciplines IR applications Scott Brandt, PI; M. Witt & J. Carlson, (Purdue) Palmer, Cragin, Heidorn, & Shreeves (Illinois) Biochemistry Biology Civil Engineering Electrical Engineering Food Sciences Earth and Atmospheric Sciences Soil Science Anthropology Geology Plant Sciences Kinesiology Speech and Hearing Earth and Atmospheric Sciences Soil Science Data Curation Profiles Toolkit at Purdue: http://www4.lib.purdue.edu/dcp/
Field Specific Research Area Form to be sharedFormats Type of data setSize Shared when? Agronomy water quality, drainage, and plant growth cleaned, reviewed sensor; hand-collected samples.xls approx. 100 files ~1MB each, up to 20 Mb After publication Geology rock, water and microbes averaged sensor; hand-collected samples; photographs.xls; jpg 1 file; images < 1 Mb After publication Civil Engineering traffic movement cleaned, normalized sensor MySQL postgresql 1 database appro x. 1000 K/day 1 month to 1 year embargo Which can be shared when?
Private vs. public data sharing – Supplying data – limited and controlled distribution by request – Exposing data – public access conditioned by data management pressures and experience Complex mis-use concerns: misinterpretation– presumed problems misappropriation – actual premature re-use disregard of good faith practices – how used, what referenced Cragin, Palmer, Carlson, & Witt (2010). Data sharing, small science, and institutional repositories. Philosophical Transactions of the Royal Society A, 368, 4023-4038.
Interpreting practices long-term use by others, especially in other fields collective value in aggregate with other data How do we identify and represent potential for reuse? Forms most easily or willingly shared may not have the most re-use value. “My data will never be of use to anyone else.” “There are no standards in my field” “Of course I'm willing to share my data publicly".
Data Conservancy PI, Sayeed Choudhury, Illinois research team Data practices group - (Palmer, Cragin, Chao, Weber, Thomer, Baker) comparative analysis - earth and life sciences long-term, re-use value of data Data concepts group - (Renear, Dubin, Sacchi, Wicket) formal terminology, identity conditions for data sets, versions, etc. representation levels (data, encoding, format) A blueprint for data infrastructure and curation services for research libraries and other organizations.
Data practices - progressive data collection Talking shop about data - efficient exchange with right scientists about right things Lead scientists - research context, IP, access, discovery, re-use 1) Pre-interview worksheets 2) Semi-structured interviews 3) Follow-up sessions with selected participants Researchers managing data - stages, versions, standards, tools 4) Data deposit & sharing worksheet 5) Data samples, related documentation
SHARING GeoscienceSoil Ecology Oceanographic / Climate Modelers What - physical rock samples - images, stratigraphy data table from sample -methodologies -species taxonomies -scripts, code -model output (netCDF) When by requestby request; funding policy (i.e. LTER) by request within personal networks; methodological conditions Howe-maile-mail, phone, site visitE-mail With Whom “experts” in related fields, readers collaborators, colleagues, readers Practices Provider - no standard for attribution Receiver – may offer co-authorship Provider - possible acknowledgement Receiver – needs methods training & programming Provider - supply publication with data for citation, may request co- authorship or acknowledgement
Data products within communities GeobiologyVolcanologySoil ecologySensor science Data unit Time series: (site specific) spreadsheets microscopy images annotated digital “field photos” Rock profile: physical rock thin section chemical analysis photographs field notes Database: multiple abiotic soil measures associated metadata Database: soil data sensor data User communities Geobiology, Geology, Chemistry, Microbiology U.S. Park Service Geology – igneous petrology Geophysics Geochemistry Biochemistry Earthworm ecology Network Science Computer Science Sharing conventions by request no repository mostly post-pub, some unpublished by request no repository public resource collection Reference data industry Limits – customization “vertical” dev.