Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management Plans READING, WRITING, AND SHARING.

Similar presentations


Presentation on theme: "Data Management Plans READING, WRITING, AND SHARING."— Presentation transcript:

1 Data Management Plans READING, WRITING, AND SHARING

2 What is a DMP? What are they?

3 What is a DMP good for? Photo: https://flic.kr/p/9rpM2p

4 OSTP memo Memo: https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_p ublic_access_memo_2013.pdf https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_p ublic_access_memo_2013.pdf What it means: http://datapub.cdlib.org/2013/02/28/the-new-ostp- policy-what-it-means/http://datapub.cdlib.org/2013/02/28/the-new-ostp- policy-what-it-means/ Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.

5 OSTP mandate Data definition: “…digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”

6 Who requires a DMP?

7 OSTP responses Crowd-sourced summary of responses: http://bit.ly/FedOASummaryhttp://bit.ly/FedOASummary Consumer Reports-style summary: http://figshare.com/articles/Overview_of_OSTP_Responses/1367165 http://figshare.com/articles/Overview_of_OSTP_Responses/1367165 #OSTPresp

8 Examples Evans school: http://evans.uw.edu/myevans/data-management-planshttp://evans.uw.edu/myevans/data-management-plans https://www.lib.umn.edu/datamanagement/DMP/example http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=570

9 Evans school DMP: Expected data We propose to conduct two waves of interviews with approximately 600 households in the panel. Enumerators will record responses onto paper surveys which will then be entered. A data dictionary will be created in MS Excel that lists the variables in the final dataset, the corresponding question number, and a field for whether the variable could potentially be used to identify a household. We will refer to this main survey dataset as the “household dataset”. Using a handheld GPS unit, we will record the location of each interviewed household, all potential water sources (including rivers and springs), and schools (the “GIS dataset”).

10 Expected data In addition, the project will generate a large amount of data from sensors attached to water collection containers (jerricans). These sensors – to be developed – will record the time that a jerrican is in motion, but are not expected to record geospatial data (i.e. GPS tracking). Data will be collected from sensors by the full time Ethiopian staff person using a Bluetooth-enabled smartphone. This data is expected to be collected from the sensors approximately monthly and emailed to the project team in the US (the “sensor dataset”).

11 Data entry Data entry will be done in CSPro, a free and open-source software program used by the US Census Bureau. Data will be entered twice by two different data entry operators. The two entry files will be checked for inconsistencies, and any inconsistencies found will be checked against the original paper survey. This process is repeated until the two entry files are identical. One of these files will then be exported to Stata. (This was the procedure followed in the 2010 pilot survey). Paper survey management Completed copies of the paper questionnaires will be stored in the Addis Ababa Economics Department for a period of five years after data collection. For example, the first wave of surveys in July 2012 would be destroyed in 2017.

12 Data Management While analysis is ongoing (expected to last until June 2014), the GIS dataset and the household dataset (including variables that could be used to identify the household) will be stored in four places: on the PI’s computer, on the computer of the AAU lead (Tekie Alemu), on the computer of the PhD student working on data analysis, and on a password-protected external hard-drive stored offsite (this was the procedure followed in summer 2010). The sensor dataset, which will include a linking variable to the household dataset, will be stored in the same locations as well as on the computers of project staff in CSE at UW. After analysis is complete, the dataset will be removed from all computers and archived as described below (“Archive”). A record of all analysis steps will be carefully documented in “.do” program files. Project personnel will create a “univariate analysis” file that lists simple tabulations of each variable (with some cross- tabulations) embedded into the original survey file (in MS Word).

13 Data dissemination The PI will make available on his University website all survey materials used, including 1) original survey documents, 2) images of activity cards used, 3) materials used to train enumerators, 4) the “univariate analysis” file, 5) the data dictionary and 6) final Stata “do files”. This information will be posted within 6 months of each data collection round (e.g. in March 2012 for the July 2012 data collection). Many of these materials from the 2009 & 2010 pilot surveys are already available on the PI’s website.

14 Data dissemination To protect the privacy of respondents and the confidentiality promised in the informed consent, we will be unable to disseminate the full linked dataset. The GIS dataset of water sources and schools will be posted on the PI’s website, though not the location of households. The household dataset will be split into two parts, one with variables that could be used to identify the household (name, location, etc), and the other with non-identifying variables (the main dataset). A third “key file” will contain the variables needed to link these two datasets. The main dataset will be posted on the PI’s website, with a variable linking to the sensor dataset (which we also expect to post, as it would have no identifying-information). We expect that it will be possible to replicate our results using only this main dataset. For example, although the household’s GPS location must not be disseminated, we expect to code variables such as “distance to market”, “distance to water source #12”, etc in the disseminated dataset.

15 Data dissemination This data will be posted two years after the data collection has completed (September 2015) to give the team time to analyze the data and publish the findings. The dataset could be posted sooner upon acceptance into a journal requiring original data (such as AER).

16 Archiving In accordance with University policy, data collected as part of funded research must be archived on the Evans School network as well as on the PI’s computer. All of the above will be stored there under similar timelines, except that datasets will be stored immediately after creation. Should the PI leave the University, he will retain access to the datasets and will continue to link to the data on his new professional website. The data will also of course be available by contacting the Evans School directly. To maintain the security of the data, the household dataset will be archived as two datasets (only one with identifiable variables), with a password-protected key file that can be used to link them. The PI and the network administrator will store copies of this password. This key file will be deleted (and links between the responses and personally- identifiable data severed) in accordance with our Human Subjects study approval. For data collected in the pilot study in 2009 and 2010, this means severing links by December 31, 2016. Should the proposed study be funded, we will request a longer period of retention (15 years) for the data collected because of its unique panel nature.

17 Tools How can I create one?

18 Questions? ?


Download ppt "Data Management Plans READING, WRITING, AND SHARING."

Similar presentations


Ads by Google