Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research on Data Curation and Repositories

Similar presentations


Presentation on theme: "Research on Data Curation and Repositories"— Presentation transcript:

1 Research on Data Curation and Repositories
GSLIS Research Showcase, 9 April 2010 The Data Conservancy: Research on Data Curation and Repositories Center for Informatics Research in Science & Scholarship Carole Palmer, PI Melissa Cragin, John MacMullen, Tiffany Chao Allen Renear, Dave Dubin, Simone Sacchi Michael Welge & Loretta Auvil, NCSA Network of domain and data scientists, information and computer science researchers, librarians, and engineers, enterprise experts, led by JHU. Led by:

2 What’s the problem? Scientists & scholars generate increasingly vast amounts of digital data. Digital data is extremely fragile; few standards of good practice. Data are essential raw materials of science and scholarship Data are valuable institutional, disciplinary, and national assets with tremendous potential for integration and reuse. Need for repositories of “curated” data Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship and science. enable data discovery and retrieval maintain data quality add value provide for re-use over time

3 flickr.com/photos/001fj/2907653323/
The Data Conservancy asserts research libraries as core part of emerging distributed network of data collections and services “Data sets are the new special collections.” (Sayeed Choudhury, personal communication, 2007) “Data centers are the new library stacks.” (Winston Tabb, JHU Dean of Libraries) Data collections and services consistent with research library mission. Will be like other collections requiring library support and expertise Will need to serve broad academic constituency. flickr.com/photos/001fj/ / Flickr users: stancia, rh creative commons

4 Astronomy as an exemplar scientific community
Achieved notable success in community data standards, practices, documentation, and associated services for research and learning. DC initial goal - ingest astronomy data into preservation archive, connect data to existing services used by astronomers. ** SDSS 140 TB, 3 times that currently held on JHU campus Demonstrate utility of hosting data in environment that supports existing scientific capabilities in a sustainable manner. Extend to: life sciences earth sciences social sciences

5 To date, limited support for “small” science
Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time will generate 2-3 times more data than Big Science. (‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education, 23/06/2006.) small science data

6 CIRSS contributions to DC and DataNet Partners
Data practices group (Palmer, Cragin, MacMullen, Chao) comparative analysis concentrating on small science taxonomies of data types, practices, & curation criteria for deposition, sharing, quality control long-term potentials of data Data concepts group (Renear, Dubin, Sacchi) development of formal terminology, identity conditions for collections, data sets, versions, and data items rules that relate collection and data set metadata support development of common collection registry scheme NCSA SEASR group (Welge, Auvil) extend and advance Software Environment for the Advancement of Scholarly Research – begin with high throughput biology


Download ppt "Research on Data Curation and Repositories"

Similar presentations


Ads by Google